Amendment Under 37 C.F.R. §1.111 
U.S. ApplnNo. 10/685,456 



Arty. Docket No.: Q77945 



AMENDMENTS TO THE DRAWINGS 

Applicants have amended the word "MOMED" in block T24 of FIG. 16 to read 
"MOVED." 

Applicants have amended the word "EXTARCT" in block T51 of FIG. 21 to read 
"EXTRACT." 

Attachment: 2 Replacement Sheets 
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REMARKS 

Claims 1-79 are pending in the application. Claims 6, 8-10, 33, 34, 42-44, 48, 50-52, 75 
and 76 have been withdrawn from consideration pursuant to an election of species. 

Applicants thank the Examiner for considering the references cited with the Information 
Disclosure Statement filed on October 16, 2003. 

Applicants also thank the Examiner for acknowledging the claim for priority under 35 
U.S.C. § 1 19, and receipt of a certified copy of the priority document. 
Objections to the Specification 

The specification has been objected to for allegedly containing a hyperlink on page 1, 
line 26. Applicants have amended the paragraph on page 1 containing the text interpreted as a 
hyperlink. 

The specification has also been objected to as containing numerous grammatical errors. 
The Examiner has required a submission of a substitute specification. In accordance with 37 
C.F.R. § 1 . 125, Applicants submit herewith a marked-up copy and a clean copy of a substitute 
specification to correct the grammatical errors the specification. Applicants submit that no new 
matter has been added to the substitute specification. 

Regarding the objection directed to antecedent basis for the phrase "a hypertext on a Web 
site to be checked target," Applicants have amended the claims to eliminate the phrase. 

Applicants respectfully request that these objections be withdrawn. 
Objections to the Drawings 

Figure 16 has been objected to for a typographical error in block T24. Applicants have 
amended the word "MOMED" in block T24 to read "MOVED," as suggested by the Examiner. 
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Figure 21 has also been objected to for a typographical error in block T5 1 . Applicants 

have amended the word "EXTARCT" in block T5 1 to read "EXTRACT," as suggested by the 

Examiner. 

Applicants respectfully request that these objections be withdrawn. 
Objections to the Claim 

Claims 1-5, 7, 1 1-32, 35, 38-41, 45-47, 49, 53-74 and 77 have been objected to for 
various informalities. Applicants have amended the claims as suggested by the Examiner to 
correct these informalities. Applicants respectfully request that the claim objections be 
withdrawn. 
Claim Rejections 

Claims 1-5, 7, 1 1-32, 35-37, 45, 53-59, 72-74 and 78 have been rejected under 35 
U.S.C. § 101 as allegedly being directed to non-statutory subject matter. Applicants traverse 
these rejections. 

Applicants respectfully disagree with the Examiner's allegation that the claims recite an 
abstract idea, i.e., merely "checking links." As required by the Office in determining compliance 
with § 101, Applicants' invention as a whole produces the "useful, concrete and tangible result" 
of detecting logically mismatched links in web pages and further, provides means for correcting 
the detected logically mismatched links. Therefore, the claims do not recite an abstract idea. 

The Examiner further alleges that the recited invention is computer software per se. 
Applicants respectfully disagree with the Examiner's allegation. As disclosed in the 
specification, for example at FIG. 32, an apparatus of the invention may include a data 
processing unit which realizes the functionality of the computer program (Specification, page 73, 
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lines 12-20). The CAFC has held that such programming creates a new machine, because a 
general purpose computer in effect becomes a special purpose computer once it is programmed 
to perform particular functions pursuant to instructions from program software. In re Alappat, 
31 USPQ2d 1545, 1558 (CAFC 1994). Further, a computer program used in a computerized 
process where the computer executes the instructions set forth in the computer program is 
statutory subject matter. Only when the claimed invention taken as a whole is directed to a mere 
program listing, i.e., to only its description or expression, is it descriptive material per se and 
hence nonstatutory. MPEP § 2106. Therefore, contrary to the Examiner's allegation, and in 
light of the Office requirements pertaining to § 101 of viewing the invention as a whole, the 
apparatus is not comprised solely of computer software. 

For at least the above reasons, Applicants respectfully request that the § 101 rejections be 
withdrawn. 

Claims 1-5, 7, 1 1-32, 35-41, 45-57, 49, 53-74 and 77-79 have been rejected under 35 
U.S.C. § 1 12, second paragraph, as allegedly being indefinite. These claims have been amended 
as suggested by the Examiner to particularly point out and distinctly claim the subject matter of 
the invention. Applicants respectfully request that these § 1 12, second paragraph, rejections be 
withdrawn. 
Conclusion 

In view of the above, reconsideration and allowance of this application are now believed 
to be in order, and such actions are hereby solicited. If any points remain in issue which the 
Examiner feels may be best resolved through a personal or telephone interview, the Examiner is 
kindly requested to contact the undersigned at the telephone number listed below. 
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Atty. Docket No.: Q77945 



The USPTO is directed and authorized to charge all required fees, except for the Issue 
Fee and the Publication Fee, to Deposit Account No. 19-4880. Please also credit any 
overpayments to said Deposit Account. 

RespectfuU 



SUGHRUE MION, PLLC 
Telephone: (202) 293-7060 
Facsimile: (202) 293-7860 

WASHINGTON OFFICE 

23373 

CUSTOMER NUMBER 




Francis l^PIati, Sr. 
Registration No. 59,153 



Date: October 1 9, 2006 
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SUBSTITUTE SPECIFICATION (Marked-up vision) ^Attorney Docket no. Q77945 
U.S. Application No. 10/685,456 / Q ^ i 9 1^ 

APPARATUS, METHOD, AND COMPUTER F^pGRAMJ^DUCT FOR CHECKING 

HYPERTK 



This application is based on Japanese patent application NONo. 2002-302585. the 
content of which is incorporated hereinto by reference. 



BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

The present invention relates to an apparatus, method and computer program product for 
checking a hypertoxt web page links , and more particularly, to an apparatus, method and 
computer program product for detecting part of an e rro r errors in a link sourc e d e scription 
hyperlink hyperlink s and a relationship relationships b etween links and target web pages m-a 
hypertext . 

2. Description of the Related Art 

In recent years, companies, organizations, and people have had many occasions to make 
the computerized information public on the site of Internet. Most of information published on 
these sites are hypertexts. 

There is disclosed a first example of the conventional technology of hypertext link 
checking a hypert e xt targeting a hypert e xt on Internet, in nonpatent literature on a link 
eheeke fdescribing "LinkScan™" produced by Elsop™ (Electronic Software Publishing 
Corporation), at URL:http:/www.elsop.com/linkscan/ on Intern e t available on the Elsop website , 
last searched a^on Oct. 9th, 2002. This is a tool that automatically go e s around the hypertexts 
ov e r th e Int e rnet to hav e a log r e cord e d ther e in on th e occasion of an erro r scans hypertext links 
and compiles logs of detected link errors . Th e r e are some typ e s of such The disclosed link 
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checker including includes one type of the link checker that4s-adapted to diagnose a target online 
in accordance with the specified address of the target, and the oth e r another type of the-link 
checker that is adapted to perform offline diagnosis of a website downloaded diagnose fold e r 
offlin e in accordance with th e sp e cified to a particular folder m-on a hard disk. 

There is disclosed a second example of the conventional technology of detecting a 
physical mismatch in a link, in Japanese Non-examined Patent Publication No. 2001-273185. 
The method in the conventional technology comprises the steps of: storing an address of the 
hyp e rt e xt link to be managed in a database; and checking whether there is a document at the 
stored address of the hyp e rt e xt link or not, thereby making it possible to detect a physical 
mismatch in such as a dead link. The above conventional method further comprises the step of 
previously registering, on a system, a keyword and image for identifying each of documents in 
the database. In the conventional method, when the dead link is detected, it is possible to 
search for a vanished page by a search engine to then provide with a correction candidate. 

There is a third example of the conventional technology of a typical system for checking 
a document including a document correcting system such as an auto-correcting function in 
Microsoft ® Word produced by Microsoft Corporation. These document correcting systems are 
operable to detect an inappropriate expression such as an e rror of a declensional "Kana", which 
is a kind of Japan e s e charact e r, e nding and a r e peat of a postpositional particl e of Japanese, and 
to then output a correction candidate. 

A first problem to be solved is that, in the aforementioned first and second example of 
the conventional technologies, only a physical mismatched link can be detected, but a logically 
mismatched link can not be detected, because of the fact that, in the aforementioned conventional 
technologies, the judgment whether there is a mismatch or not is made based on only the result 
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of the judgment whether an error is returned from a server or not, when the connection to an 
address of a hyp e rt e xt link is gotten. The method of detecting a logically mismatch has no 
choice but to rely on manual and visual confirmation on a browser at present, because no error 
occurs in case of the logically mismatch. 

A second problem to be solved is that, in the aforementioned first and second example 
of the conventional technologies, it is impossible to provide a correction candidate for the 
logically mismatch but it is possible to provide a correction candidate for only the physical 
mismatch. The reason for this problem is the similar to that of the above first problem. 

A third problem to be solved is that the manual and visual confirmation on the browser 
needs enormous cost. The reason for this problem is that a large scale of site, such as of a 
company, has hyp e rt e xts links of between thousand and tens of thousands, and the number of 
links between documents reaches to between tens of thousands and hundreds of thousand. The 
confirmation of whole of these links is not realistic about viewpoints of time and cost. The 
confirmation on the browser is also apt to omit to check a phantom link and the like. 

A fourth problem to be solved is that, in the aforementioned third conventional 
technology, the logically mismatch, such as disunity in the Hole sourc e d e scriptions hyperlink , 
cannot be detected although th e audi e nc e is confused causing confusion by the fact that the link 
sourc e d e scription hyperlink s has th e diff e r e nc e have different expressions for the links to the 
same documents. The reason of this problem is that th e link sourc e d e scription hyperlink 
including no unsuitable e xpression a hyperlink having any appropriate syntax may be regarded 
as a normal. 



SUMMARY OF THE INVENTION 
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It is therefore a first object of the present invention to provide an apparatus, method, and 
computer program product for checking a hyp e rt e xt link in which not only the physical mismatch 
but also logical mismatch can be detected. 

It is a second object of the present invention to provide an apparatus, method, and 
computer program product for checking a hyp e rt e xt link in which it is possible to provide an 
administrator with a correction candidate of not only the physical mismatch but also the logical 
mismatch. 

It is a third object of the present invention to provide an apparatus, method, and 
computer program product for checking a hypertext link in which a cost of the mismatch check 
can be considerably reduced. 

In accordance with an aspect of the present invention, there is provided an apparatus for 
checking a hypert e xt link, targeting a hypertext database, and b e ing capabl e of d e t e ctin g which 
detects at least one part of logically mismatched link including: a part -link having a mismatch 
between a link sourc e d e scription hyperlink appearing on the source web page and contents on 
the liftk-target web page: a par Mink having a mismatch between a link sourc e d e scription 
hyperlink and contents on the Unletarget web p age that is caused by correcting contents in the 
HnJ^target web page; a part-link causing disunity inconsistency among a plurality of different 
link sourc e d e scriptions hyperlinks having the same fede-target web page; a paf^ -link causing 
inconsistency disunity in styles among a plurality of different link source descriptions hyperlinks 
within the same page and around the pages; a part of link having no link source 
d es cription hy perlink : and a par^ -link in which all of the link sourc e descriptions hyperlinks in a 
group of links forming a loop and corresponding to this group of links are related to a same 
topic. 
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More specifically, a first hypert e xt link checking apparatus comprises: an information 
storing unit capabl e of storin g which stores therein information about a page and link in the 
hyp e rt e x t hyperlink ; and a condition detecting unit for r e ferring to analyzing said information in 
said information storing unit to detect some parts of logically mismatched link. 

A second hyp e rt e xt link checking apparatus comprises: an information collecting unit 
for collecting information about a page and link in the hyp e rt e x t hyperlink ; an information storing 
unit capable of storing therein said information about the page and link; and a condition 
detecting unit for r e f e rring to analyzing said information in said information storing unit to detect 
s om e part s of logically mismatched link. 

A third hyp e rt e xt link checking apparatus comprises: the constitutional elements of the 
first and second hyp e rt e xt link checking apparatus; and a candidate providing unit for calculating 
a correction candidate concerning said parts- links detected by said condition detecting unit. 

A fourth hypert e xt link checking apparatus comprises: the constitutional elements of the 
third hyp e rtext link checking apparatus; and an importance calculating unit for calculating and 
outputting importance value of the par ^link detected by said condition detecting unit. 

A fifth hypert e xt link checking apparatus comprises: the constitutional elements of the 
third and fourth hyp e rtext link checking apparatus; and a correction reflecting unit for reflecting 
said hyp e rtext hyperlink based on the part of the mismatched link detected by said condition 
detecting unit and the correction candidate calculated by said correction providing unit. 

A sixth hypert e xt link checking apparatus comprises: the constitutional elements of the 
fourth hyp e rt e xt link checking apparatus; and a total score calculating unit for calculating and 
outputting a total score concerning to said hyp e rt e xt hyperlink in accordance with at least a 
factor or a combination of a plurality of factors including the importance value calculated by said 
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importance calculating unit, the number of said parts-links detected by said condition detecting 
unit, and the rate of the number of said parts -links corresponding to the number of total links and 
detected by said condition detecting unit. 

A seventh hyp e rt e xt link checking apparatus comprises: the constitutional elements of 
the first and second hypert e xt link checking apparatus; and an importance calculating unit for 
outputting importance value of the parts -links detected by said condition detecting unit. 

An eighth hypert e xt link checking apparatus comprises: the constitutional elements of 
the seventh hyp e rt e xt link checking apparatus; and a total score calculating unit for calculating 
and outputting a total score concerning to said hypertext in accordance with at least a factor or a 
combination of a plurality of factors including; the importance value calculated by said 
importance calculating unit, the number of said parts -links detected by said condition detecting 
unit, and the rate of the number of said parts -links corresponding to the number of total links and 
detected by said condition detecting unit. 

In the first, second, seventh, and eighth hyp e rt e xt link checking apparatus, said 
condition detecting unit may be operated to group the information about said links by a 
predetermined conditions, and to detect the information about the links excluded from said 
groups. 

In the first, second, seventh, and eighth hyp e rt e xt link checking apparatus, said 
condition detecting unit may be operated to detect par^ a link having a mismatch between a 
hyperlink appearing on the source web page link source d e scription hyperlink and contents on 
the link-target web p age. In this case, said condition detecting unit may be operated to calculate 
an criteria score of the link based on at least one of the criteria scores of the links including: (1) a 
first criteria score calculated by comparing the link sourc e d e scriptions hyperlinks of the links for 
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the same MnJ^-target web p ag e with e ach oth e r ; (2) a second criteria score calculated by 
comparing the target web p ages of a plurality of links represented by the same link sourc e 
description hyperlink with e ach other ; (3) a third criteria score calculated by comparing the fe*k 
target web p ages based on a plurality of links for the same folk-target web p age and the same Hnk 
sourc e d e scription hyperlink with e ach oth e r ; and (4) a fourth criteria score calculated by 
comparing the link source description hyperlink and the link-target web p age in the contents, and 
said condition detecting unit is operated to detect part-a link with a high criteria score. 

In the first, second, seventh, and eighth hyp e rt e xt link checking apparatus, said 
condition detecting unit may be operated to detect par^a link h aving a mismatch between a link 
sourc e d e scription hyperlink and contents on the Hnl^target web p age that is caused by correcting 
contents in the linl^target web page. 

In this case, said condition detecting unit may be operated to calculate an criteria score 
of the link based on at least one of the criteria scores of the links including: (1) a first criteria 
score calculated by comparing the link sourc e d e scriptions hyperlinks of the links for the same 
link-target web pag e with each oth e r ; (2) a second criteria score calculated by detecting at least a 
notice description including a movement notice description and an expiration notice description 
in the contents of the link-target web page; and (3) a third criteria score calculated by comparing 
the description of period of validity described in the contents of the Jinletarget web p age and the 
present date and time, and said condition detecting unit is operated to detect paffc -a link with a 
high criteria score. 

In the first, second, seventh, and eighth hyp e rt e xt link checking apparatus, said 
condition detecting unit may be operated to detect a part- link causing disunity inconsistency 
among a plurality of different link s ourc e d e scriptions hyperlinks having the same fey^target web 
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page. 

In the first, second, seventh, and eighth hyp e rt e xt link checking apparatus, said 
condition detecting unit may be operated to detect part causing disunity inconsistency in styles 
among a plurality of different link s ourc e d e scriptions hyperlinks within a same web p age and 
p e ripheral pagos in a same website . 

In the third through sixth hypertext link checking apparatus, said condition detecting 
unit may be operated to group the information about said links by a predetermined conditions, 
and to detect the information about particular links excluded from said groups, while said 
candidate providing unit may be operated to obtain the correction candidate so as to uniform the 
information about said particular links with the other right links. 

In the third through sixth hyp e rt e xt link checking apparatus, said condition detecting 
unit may be operated to detect a par ^link having a mismatch between a link sourc e d e scription 
hyperlink and contents on the Imk-target web page. 

In this case, said condition detecting unit may be operated to calculate an criteria score 
of the link based on at least one of the following scores of the links including: (1) a first score 
calculated by comparing the link source descriptions hyperlinks of the links for the same link 
target web pag e with e ach oth e r ; (2) a second score calculated by comparing the target web 
pages of a plurality of links represented by the same link sourc e d e scription hyperlink with e ach 
ether; (3) a third score calculated by comparing the ttnl^target web pages based on a plurality of 
links for the same Uftk-target web p age and the same link source d e scription hyperlink with e ach 
other ; and (4) a fourth score calculated by comparing the link s ourc e description hyperlink and 
the linJ^target web page in the contents, and said condition detecting unit being operated to 
detect part -link with a high criteria score, said candidate providing unit being op e rat e d to 



8 



SUBSTITUTE SPECIFICATION (Marked-up version) Attorney Docket no. Q77945 

U.S. Application No. 10/685,456 

speeif yspecifving at least a sort of correction candidate including: (1) a correction candidate of 
the H«k — sourc e — d e scription — hyperlink calculated by comparing the link — sourc e 
descriptions hyperlinks of the links for the same fed^-target web page with e ach other ; (2) a 
correction candidate of the link source d e scription hy perlink calculated by comparing the link 
target pages based on a plurality of links for the same link sourc e d e scription hy perlink with each 
oth e r ; (3) a correction candidate of the link sourc e description hyperlink calculated by comparing 
the link target pages based on a plurality of links for the same link targ e t page target web page 
and the same link source d e scription hyperlink with e ach oth e r : and (4) a correction candidate of 
the link sourc e description hyperlink calculated by comparing the link sourc e description 
hyperlink and the link target pag e target web page in the contents. 

In the third through sixth hyp e rt e xt link checking apparatus, said condition detecting 
unit may be operated to detect part having a mismatch between a link sourc e d e scription 
hyperlink and contents on the link target page target web page that is caused by correcting 
contents in the link-target web page. 

In this case, said condition detecting unit may be operated to calculate an criteria score 
of the link based on at least one of the criteria scores of the links including: (1) a first criteria 
score calculated by comparing the link sourc e d e scriptions hyperlinks of the links for the same 
link target page target web page with e ach oth e r : (2) a second criteria score calculated by 
detecting at least a notice description including a movement notice description and an expiration 
notice description in the contents of the link-target web page: and (3) a third criteria score 
calculated by comparing the description of period of validity described in the contents of the link 
target web page and the present date and time, and said condition detecting unit is operated to 
detect par ^a link with a high criteria score, said candidate providing unit being operated to 
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specify at least a sort of correction candidate including: (1) a correction candidate of the H«k 
sourc e description hyperlink calculated by comparing the link s ource d e scriptions hyperlinks of 
the links for the same Unl^target web p age with e ach other ; and (2) a correction candidate of the 
link sourc e description hyperlink calculated by extracting the information about a movement 
destination from with the contents of the link-target web p age. 

In the third through sixth hyp e rt e xt link checking apparatus, said condition detecting 
unit may be operated to detect parfc -a link causing disunity inconsistency among a plurality of 
different link sourc e d e scriptions hyperlinks having the same link-target web page, said candidate 
providing unit being operated to calculate the correction candidate of the link sourc e d e scription 
hyperlink by comparing the link sourc e d e scriptions hyperlinks of the links for the same link 
target web p age with e ach oth e r . 

In the third through sixth hyp e rt e xt link checking apparatus, said condition detecting 
unit may be operated to detect par^a link causing disunity inconsistency in styles among a 
plurality of different link sourc e d e scriptions hyperlinks within the same page and around th e 
pages within a same website , and said candidate providing unit being operated to calculate the 
correction candidate of the style of the link source description hyperlink by comparing the style 
of a plurality of link sourc e descriptions hyperlinks within the page and within a same website 
including the detected parts links and around th e pages . 

In the second through sixth hypert e xt link checking apparatus, said information 
collecting unit may b e operat e d to repeatedly collect the information about the page and link in 
the hyp e rt e xt hyperlink , to further store said information about the page and link a^a plurality of 
times in said information storing unit. In this case, said condition detecting unit may be 
operated to fefei^-t eanalyze said information in said information storing unit to calculate a change, 
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in accordance with time, in the number of targeted links corresponding to a page corrected in the 
contents, and a change in sorts of link source description hyperlink with time, so as to detect part 
links in which a mismatch between the link sourc e description h yperlink and the contents of the 
Wftttarget web p age is d e t e cted . 

In the first through eighth hyp e rtext link checking apparatus, said condition detecting 
unit may be operated to detect a link having no link s ourc e d e scriptio nh yperlink . 

In the first through eighth hyp e rt e xt link checking apparatus, said condition detecting 
unit may be operated to detect a link including a link having no character string and an image 
described as the link sourc e d e scription hyperlink and a link having a character string and an 
image described as the link sourc e d e scription hyperlink with an inconspicuous color and a size. 

In the first through eighth hyp e rt e xt link checking apparatus, said condition detecting 
unit may be operated to detect part in which all of the link source d e scriptions hyperlink in a 
group of links forming a loop and corresponding to this group of links are related to the same 
topic. 

In the fourth through seventh hyp e rt e xt link checking apparatus, said importance 
calculating unit may be operated to calculate importance value based on at least a factor or a 
combination of a plurality of factors including: (1) a sort of errors and unsuitability of the 
detected partslinks; (2) accuracy of errors and unsuitability of the detected parte links ; (3) the 
number of targeted links of the page including the detected par telinks ; (4) record for frequency of 
access by user to the page including the detected paf telinks ; and (5) a stratification level in the 
hypertext of the page including the detected partslinks, while said importance calculating unit 
may be operated to calculate the importance value of the detected partslinks, and to control, in 
accordance with said level of importance value, output condition for the detected parts links 
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including the number of outputting records, and a method of outputting the records. 

In the second through eighth hyp e rt e xt link checking apparatus, said information 
collecting unit may be operated to extract the character strings corresponding to said link sourc e 
description hyperlink by character recognition when the link sourc e description hyperlink is an 
image, and to r e sist e r register the extracted character strings as said information about page and 
link on said information storing unit. 

The first through eighth hyp e rt e xt link checking apparatus may target a hyp e rt e xt 
hyperlink on a W e b sit e website . 

In accordance with another aspect of the present invention, there is provided a first 
hyp e rt e xt link checking method comprising the steps of: (a) determining conditions for the check 
of a hyp e rt e xt hyperlink database so as to detect parts -links including: par Hinks e ^having an 
error in a link sourc e d e scription hyperlink ; part o fl inks having an error in a relationship between 
links; part o f links having unstability in a link sourc e d e scription h yperlink ; and part o fl inks 
having unstability a relationship between links; and (b) displaying, on a display screen, a list 
having three items including: (1) a link sourc e d e scriptio nh yperlink ; (2) identification 
information about a ttnl^source web page; and (3) identification information about a link-target 
web page. 

In the above hypertext checking method, said step (b) may include the step of displaying 
a list sorted by each of three items including: (1) a link sourc e d e scriptio n hyperlink ; (2) 
identification information about a iinl^source web p age; and (3) identification information about 
a link-target web page. 

The above hypert e xt link checking method may further comprise the steps of: (b) 
displaying, on a display screen, a list having three items including: (1) a link sourc e 



12 



SUBSTITUTE SPECIFICATION (Marked-up version) Attorney Docket no. Q77945 

U.S. Application No. 10/685,456 

d e scription h yperlink ; (2) identification information about a fed^source web p age; and (3) 
identification information about a link-target web p age; (c) allowing an operator to correct said 
items (1), (2), and (3) on said display screen; and (d) reflecting all of said items corrected in said 
step (c) to correct said hyp e rt e xt hyperlink database. 

The above hypert e xt link checking method may further comprise the step of specifying 
the targeted hyp e rtext hyperlink database. 

A second hyp e rt e xt link checking method comprising the steps of: (a) collecting 
information about a page and link in a Web sit ewebsite ; (b) referring to analyzing the result of 
said step (a) to detect som e parts of ajogically mismatched link; (c) calculating importance value 
of the part -link detected in said step (b) and calculating a total score concerning to a website Web 
site; (d) performing periodically said steps (a) to (c) for a website W e b sit e specified as a target; 
and (e) informing about a change with time in said total score concerning to the specified 
website Web sit e. 

A third hyp e rt e xt link checking method comprising the steps of: (a) collecting 
information about a page and link in a W e b sit ewebsite ; (b) r e f e rring to analyzing the result of 
said step (a) to detect s ome parts of ajogically mismatched link; (c) calculating importance value 
of the part detected in said step (b) and calculating a total score concerning to a Web sit e website ; 

(d) performing periodically said steps (a) to (c) for a W e b sit e website specified as a target; and 

(e) putting out an alert when said total score concerning to the specified W e b sit e website and said 
importance value of the detected fart-link are fulfilled with a predetermined condition. 

A fourth hypert e xt link checking method comprising the steps of: (a) collecting 
information about a page and link in a W e b sit ewebsite ; (b) r e f e rring to analyzing the result of 
said step (a) to detect som e parts of ajogically mismatched link; (c) calculating importance value 
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of the part -link detected in said step (b) and calculating a total score concerning to a Web 
sttewebsite; (d) performing periodically said steps (a) to (c) for a plurality of W e b sit e website s 
each specified as a target; and (e) outputting a result of a ranking of said total scores of the 
specified plural Web sit e website s in order in level. 

In accordance with the first through eighth hyp e rt e xt link checking apparatus, the 
processes including the steps of grouping the link information by particular conditions, and 
detecting a particular link excluded from the group as a mismatched link, are performed so as to 
have the condition detecting unit detect the logically mismatched link, thereby making it possible 
to achieve the first object of the present invention. 

In accordance with the third though sixth hypertext link checking apparatus, the 
candidate providing unit is operated to perform the process of calculating the correction 
candidate to harmonize the link information of the particular link with the link information of 
large majority of the other appropriate links, thereby making it possible to achieve the second 
object of the present invention. 

In accordance with the first though sixth hyp e rt e xt link checking apparatus, the logically 
mismatch is automatically detected by the condition detecting unit. In accordance with the 
third though sixth hypert e xt link checking apparatus, the correction candidate is automatically 
calculated by the correction candidate providing unit. In fifth hyp e rt e xt link checking 
apparatus, the logically mismatched parts are automatically corrected by the correction reflecting 
unit. Therefore, the third object of the present invention can be achieved. 



BRIEF DESCRIPTION OF THE DRAWINGS 



14 



SUBSTITUTE SPECIFICATION (Marked-up version) Attorney Docket no. Q77945 

U.S. Application No. 10/685,456 

The present invention and many of the advantages thereof will be better understood 

from the following detailed description when considered in connection with the accompanying 

drawings, wherein: 

FIG 1 is a block diagram of a first embodiment of the hypertext checking apparatus 
according to the present invention; 

FIG 2A is a diagram showing examples of a document described in the format of a 
hypertext on which some links are specified; 

FIG 2B is a diagram showing examples of a display screen of the document viewed 
through a browser; 

FIG 3 is a diagram showing one example of a logically mismatch due to an error link; 
FIG 4A is a diagram showing one example of a logically mismatch due to an expiration 
period link; 

FIG. 4B is a diagram showing one example of a logically mismatch due to an expiration 
period link; 

FIG 5 is a diagram showing one example of a logically mismatch due to disunity 
inconsistency in link sourc e d e scriptions hyperlinks ; 

FIG 6A is a diagram showing one example of a logically mismatch due to inconsistency 
disunity in styles of link sourc e d e scriptions hyperlinks ; 

FIG 6B is a diagram showing one example of a logically mismatch due to inconsistency 
disunity in styles of link sourc e d e scriptions hyperlinks ; 

FIG 7A is a diagram showing one example of a logically mismatch due to a phantom 

link; 

FIG 7B is a diagram showing one example of a logically mismatch due to a phantom 
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link; 

FIG 8 is a diagram showing one example of a logically mismatch due to a loop link; 
FIG 9 is a table of an example of the link information stored in an information storing 

unit; 

FIG 10 is a flowchart showing the operation of the first embodiment of the hypertext 
checking apparatus according to the present invention shown in FIG 1 ; 

FIG 1 1 is a diagram of an example of a display screen for setting a document collection 
condition in the first embodiment of the hypertext checking apparatus according to the present 
invention; 

FIG. 12 is a diagram of an example of a display screen for setting an extraction 
condition for the mismatched link in the first embodiment of the hypertext checking apparatus 
according to the present invention; 

FIG 1 3 is a diagram of an example of a display screen of a list of results of the extracted 
mismatched link in the first embodiment of the hypertext checking apparatus according to the 
present invention; 

FIG 14 is a flowchart showing the process of extracting the error link in the first 
embodiment of the hypertext checking apparatus according to the present invention; 

FIGS. 15A to 15D are tables of examples of the link information extracted in respective 
steps in the process of extracting the error links shown in FIG 14 in the first embodiment of the 
hypertext checking apparatus according to the present invention; 

FIG 1 6 is a flowchart showing the process of extracting the expiration period link in the 
first embodiment of the hypertext checking apparatus according to the present invention; 

FIG. 1 7 is a flowchart showing the process of extracting the disunity in the link sourc e 
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descriptions hyperlinks in the first embodiment of the hypertext checking apparatus according to 
the present invention; 

FIG 18 is a table of an example of the link information in the step of the process of 
extracting the disunity in the link sourc e d e scriptions hyperlinks shown in FIG 17 in the first 
embodiment of the hypertext checking apparatus according to the present invention; 

FIG 19 is a flowchart showing the process of extracting the disunity in the styles of the 
link source pages in the first embodiment of the hypertext checking apparatus according to the 
present invention; 

FIG 20 is a table of an example of the link information in the step of the process of 
extracting the disunity in the styles of the link source pages shown in FIG 19 in the first 
embodiment of the hypertext checking apparatus according to the present invention; 

FIG 21 is a flowchart showing the process of extracting the phantom link in the first 
embodiment of the hypertext checking apparatus according to the present invention; 

FIG 22 is a flowchart showing the process of extracting the loop link in the first 
embodiment of the hypertext checking apparatus according to the present invention; 

FIG 23 is a flowchart showing the process of extracting the link varied with time in the 
link information in the first embodiment of the hypertext checking apparatus according to the 
present invention; 

FIG 24 is a table of an example of the link information extracted in the step of the 
process of extracting the links varied with time in the link information shown in FIG 23 in the 
first embodiment of the hypertext checking apparatus according to the present invention; 

FIG 25 is a block diagram of a second preferred embodiment of the hypertext checking 
apparatus according to the present invention; 
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FIG 26 is a flowchart showing the operations of the second preferred embodiment of 
the hypertext checking apparatus according to the present invention shown in FIG 25; 

FIG 27 is a diagram showing an example of a display screen of a list of results of the 
extracted mismatched link in the second preferred embodiment of the hypertext checking 
apparatus according to the present invention; 

FIG 28 is a block diagram of a third preferred embodiment of the hypertext checking 
apparatus according to the present invention; 

FIG 29 is a flowchart showing the operations of the third preferred embodiment of the 
hypertext checking apparatus according to the present invention shown in FIG 28; 

FIG 30 is a diagram showing an example of a display screen of a line chart of a change 
with time in a total score in the third preferred embodiment of the hypertext checking apparatus 
according to the present invention; 

FIG 3 1 is a diagram showing an example of a display screen of a bar graph of a site 
ranking in the total score in the third preferred embodiment of the hypertext checking apparatus 
according to the present invention; 

FIG 32 is a block diagram of a fourth, fifth, and sixth preferred embodiment of a system 
comprising a hypertext checking program according to the present invention. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
The hypertext means a set of documents structured with a hyperlin k hyperlink or a link 
and has a structure including links provided between the documents. Typical one example of 
the hypertext is a WWW (World Wide Web). The WWW is a collection of the hypertexts 
described in a HTML (Hyper Text Markup Language) format, such as a document shown in FIG. 
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2 A. The links and anchor character strings are marked with <A> tag. The document 101 
shown in FIG 2 A has href attributes of the <A> tags indicative of identification information of 
the documents 102, 103, and 104. The identification information of the document is generally 
referred to as "a URL" or "a web address" in the WWW, but will be only referred to as simply 
"an address" in the present invention. The character strings "GX0011" "GX0012", and 
"GX0013" interposed between the <A> tags are generally referred to as "anchor character 
strings". Because the image file is often interposed between the <A> tags, the image as well as 
the character string interposed between the <A> tags will be referred to as "a link source 
d e scriptio n hyperlink " in the present invention and treated as the same. 

The attribute of the <A> tag described in the document 101 shown in FIG 2 A has not 
only the href attribute but also a target attribute, a style attribute, or the like. The target attribute 
serves as an attribute for specifying which types of window is used to display thereon a 
document of a link target or a link destination. The style attribute serves as an attribute for 
specifying what size or which colors of a font, or highlighted representation are used to display 
the link sourc e d e scription hyperlink of the link hyperlink . When the document 101 shown in 
FIG 2A is viewed with a browser, the document 101 may be displayed on the display screen as 
shown in FIG 2B. The document 101 has links 201, 202, and 203 for the documents 102, 103, 
and 104, respectively, and having link source doscriptions hyperlinks "GX0011", "GX0012", and 
"GX0013", respectively. The document 102 may be accessed by way of the link 201 when the 
link sourc e description hyperlink hyperlink "GXOOH" in the document 101 is clicked. 
Similarly, the documents 103 and 104 may be accessed by way of the links 202 and 203, 
respectively, when the link sourc e d e scriptions hyperlinks "GX0012" and "GX0013", 
respectively, in the document 101 are clicked. 
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Although the WWW has been explained above as typical examples of the hypertext, the 
present invention is not limited to the object to the WWW. The hypertext may be described 
with any languages including not only the HTML but also, for example, an XML (Extensible 
Markup Language), an SGML (Standard Generalized Markup Language), and so on. 

In order to avoid any confusion in term "user", a person who visits a company, 
organization, or personal site to browse the hypertext is referred to as an "audience", while a 
person who utilizes the present invention to administer the hypertext is referred to as an 
"administrator", in the present invention. 

The administration of the hypertext however becomes complex and difficult as amount 
of information published on the Internet increases. Therefore the rate of mismatched links, 
such as a link inappropriate for the link source d e scription hyperlink , or a link mistaken in the 
link target, increases. The mismatched link may be roughly classified into two types including 
a physical and logical mismatch. 

The physical mismatch means a physically impossible mismatch to access the link target, 
in cases where there is no text of the link target, and where a server of the link target is down, for 
example. When the documents having these physical mismatches are accessed, the server or 
the client is operated to reply an error message. 

In the event of the logical mismatch, it may be physically possible to access the link 
target, but there is a logical error made in the link of the pages describing thereon such as wrong 
product information, or the expired campaign information. When a document including the 
logically mismatched part is accessed, the server is not operated to replay any error message, as a 
text in the link target exists as well as the server in the link target runs in good order. The 
audience is, however, sometimes confused by an error link, as well as the administrator 
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sometimes suffers from responses to the applications for the expired campaign applied by the 
audience. The logical mismatches therefore have significant implications no less than that of 
the physical mismatch. There are some examples of the logical mismatch including, but are not 
limited to, (1) putting a link to a wrong destination, (2) putting a link to an expired information, 
(3) disunity inconsistency in the link sourc e descriptions hyperlink , (4) disunity inconsistency in 
the styles of the link sourc e descriptions hyperlinks , (5) a phantom link, and (6) a loop link, and 
so on. Exampl e Examples of each logical mismatches is mismatch are described in detail in the 
fallowings following with reference to the drawings. 
(1) Putting a link to a wrong destination 
As shown in FIG. 3, "putting a link to a wrong destination" means a mismatch caused 
between the contents expected from the hyperlink appearing on the source web page fa k 
source description hyperlink and the practical contents in the text of the tiftk-target web page . 
In FIG. 3, the link sourc e descriptions hyperlinks of all of the links 211, 212, 213, and 214 are 
same in the description "GXOOll". All of the link targets of the documents 111, 112, and 113 
indicate the same document 116 which is representative of the product introduction of "GX001 1", 
but the link target of the document 114 indicates the wrong document 117 which is representative 
of the product introduction of "GX0012". Therefore the audience can access the document 116 
for the introduction information of "GXOOll" according to expectatio n s expected when the 
audi e nc e brows o s browsing the documents 111, 112, and 113, but th e audi e nc e cannot access the 
document 1 1 6 as expected when th e audienc e browses the wron g browsing document 1 1 A against 
e xp e ctation . Th e audi e nc e who browses When browsing the document 114^ looks another 
wrong product introduction which is the audience is linked to information different from that 
expected from the link sourc e d e scription hyperlink hyperlink "GX001 1 — This will cause a 
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thereby causing confusion to the audience. 

Moreover, all of the destinations of the links 211, 212, 213 and 215 indicate the same 
document 116, but the link sourc e d e scription hyperlink hyperlink of th e only link 215 is 
wrongly doscribod incorrectly describes the destination as "GX0012". Therefore, the audionco 
who browses tho when browsing document 115^ leeks-another product introduction which is 
different from that expected from the link sourc e d e scription hyperlink hyperlink "GX001 2" is 
displayed . This will again cause a-confusion to the audience. 

Furthermore, the document 115 has two of links 215 and 216 which ar e put to the 
documents 116 and 117, respectively. Both of the links 215 and 216, however, have the same 
link sourc e description hyperlink hyperlink "GX0012". Therefore, the audience who browses 
the document 115 finds the different contents of the documents 116 and 117 in spite of the fact 
that the audience selects the same link source description hyperlink hyperlink "GX00 1 2". 

In this embodiment, the example of putting the link to the wrong destination described 
above includes , but is not limited to, the error link to the product information , but is not limit e d 
te^-and may further include a mistake of putting a link between an English document and a 
Japanese document, an error link for a link to a completely unrelated page, and so forth. 
(2) Putting a link to an expired information 

As shown in FIG. 4, "putting a link to an expired information" means a mismatch caused 
by a remaining expired campaign, or a remaining closed service. FIG. 4A shows a group of the 
documents as of August 15th, 2002, while FIG. 4B shows a group of the documents as of 
September 15th, 2002. 

In FIG. 4A, it is announced, in the document 125, that a campaign is conducted for a 
limited time between July 20th, 2002 and August 3 1 st, 2002. The documents 1 2 1 , 1 22, 1 23 and 
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124 have the same link source d e scription hyperlink hyperlink "free admission fee" for putting 
links 221, 222, 223 and 224, respectively, to the document 125 having contents of the campaign. 

M e anwhile, in ln FIG. 4B, it is announced, in the document 125, that the campaign is 
terminated because the campaign is date has expired. In the documents 432121, 122 and 123, 
therefore, the link for the contents of the document 125 for the campaign is already eliminated. 
In the document 124, however, the link for the contents of the document 125 for the expired 
campaign is not eliminated yet, therefore the link 224 to the document 125 and the link sourc e 
description hyperlink hyperlink "free admission fee" is still left. Thus, the audience who 
browses the document 124 cannot be provided with a service shown in the link source 
d e scription hyperlink hyperlink "free admission fee" against his/h e r e xpectation as expected . 

In this embodiment, the example of putting a link to an -the expired information 
described above includes , but is not limited to, the link for the expired campaign, but is not 
limit e d to, and may further include a link mismatch caused by transferring an initial a first 
document from an original address to another address and replacing this initial the first document 
with anethe^ -a second document at the original address. Furth e rmor e , an original p e riod may 
b e unlimited. — The link for the expired information in this embodiment may further include a 
mismatch caused by abandoning the service in the link target, or closing a site due to som e 
reasons . The case when the document is eliminated due to the expiration, however, is included 
in the physical mismatch because an error occurs when accessing the document. The expired 
link may be considered as a type of the error link, but in the present invention, the link for the 
link source destination which is expired is especially distinguished from the error link and 
specified as the expired link. 

(3) Disunity Inconsistency in link sourc e d e scriptions hyperlinks 
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As shown in FIG 5, the disunity in the link sourc e d e soription s hyperlinks means a 
mismatch in cas e when there is a fluctuation caus e d by th e disunitod an error, for example, but 
not limited to, a typographical error, in the hyperlinks link source descriptions . In FIG 5, the 
documents 131, 132, 133, and 134 put the links 231, 232, 233, and 234 to the document 135. 
All of the link source deocriptions hyperlinks of the links 231, 232, and 233 ar e same in the 
d e scription indicate "GX Series", except for the link sourco d e scription hyperlink hyperlink of 
the link 234 which Vindicates "gX Series". Therefore, the audience who browses the document 
134 misunderstands may believe that the link source d e scription hyperlink hyperlink "gX 
Series" indifferent from that of "GX Series"-exists, and then follows follow the link 234. 

In this embodiment, the example of the disunity in the Hfik — sourc e 
d e scription s hyperlinks described above includes the fluctuation difference b etween a capital and 
small letter in the link sourco d e scriptio nh yperlink , but is not limited to, and may further include: 
a fluctuation between an English and "katakana ", — a — kmd — ef— Japanese charact e r, 
d e scription characters ; a fluctuation in differen t differences in "katakana " descriptions, such as 
"vaiorin" and "baiorin", both corresponding to "violin" in English; a fluctuation differences 
between a "katakana" and "hiragana", another kind of Japanese character, description; a 
fluctuation differences in vague or fuzzy and fuzzy similar expression, such as "event 
information" and "seminar information"; and a-spelling erre^ -errors such as "Series" and 
"Selies" 

(4) Disunity Inconsistency in the style of the link sourco d e scriptio nh yperlink 
As shown in FIG 6, the disunity in the style of the link sourc e description hyperlink 
hyperlink means a mismatch in different views of the link, or different effects at th e click when 
clicking on a link button, for example, due to thendifferent style or target attribut e attributes . In 
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FIG 6A, the document 141 has four links 241, 242, 243, and 244, in-three of which specify the 
target attribute is specified as "_blank" so as to open a pop-up window to display the page of the 
link target thereon. Therefore, the audience who brows e s b rowsing the document 141 as shown 
in FIG 6B may browse the documents 142, 143, and 144 of the link targets ef4h ecorresponding 
to links 241, 242, and 243 one after anothe r in pop-up windowsT while op e ning the document 
141 is_displayed on the screen. The display of th e pag e o f the-a link target target web page e ft 
in_the-a_pop-up window is efteft-convenient to particularly brows e th e when browsing a 
collection of the-links, in which the audience may browse on e after another some documents of 
the different link targets one after another while browsing the original documen t of th e coll e ction 
of tho links . Meanwhilo However , no target attribute is specified in the link 244^ T thereby 
Therefore, causing the browser changes the display from the original page to the linked page 
when the link button is clicked, rather than displaying the linked page in a pop-up 
window documents to turn at th e click on a link button . Therefor e , b e caus e Since the documents 
ton -change a^when the click on the link 24 4 is clicked , the audience skoukl-must look for a link 
to return the original document 141, or use a browser return button of th e brows e r . 

In this embodiment, the example of the disunity in the style of the link sourc e 
d e scription hyperlink hyperlink described above includes the disunity in the target attribute in 
the document, but is not limited to, and may further include a mismatch in the different color of 
some links, and in the different highlighted representation of the some links, due to the disunity 
in the style attribute. 

(5) A phantom link 

As shown in FIG 7, the phantom link means a mismatch in cas e wh e r e when the 
audience browses a document but cannot find out about a visible link in the document in spite of 
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th e fact that even though the link is described in the HTML description for the document. In 
FIG 7A, there is an <A> tag for specifying the link target as "HIDDENJJRL" which 
interposod positioned between a charact e r string, such as as the header "STOCK STATUS OF GX 
SERIES", indicativ e of a h e ader, and a -the tag <TABLE> tag indicative o fi ndicating a table. 
There is, however, no character string or image int e rpos e d between these <A> tags. Therefore, 
when the document 151 is browsed, the audience cannot notice that there is a link positioned 
int e rposed between the header and the table in th e vi e w as shewn -illustrated in FIG. 7B, when th e 
docum e nt 151 i s brows e d by th e browser . It is easy for th oA crawler te -can search for and 
follow such links, but it is difficult for the administrator to find these links. Suppos e For 
example, suppose that the link target "HIDDENJJRL" is indicative of a confidential file such as 
a customer list-. the -The information stored in the confidential file can be easily acquired by the 
crawler, while th e r e is a danger of causing th e trouble which is tha t however, since the link cannot 
be found by a human b e ing can not notic e this l e akag e , unauthorized access to the confidential 
information by the crawler may go undetected . 

In this embodiment, the phantom link described above includes , but is not limited to, no 
visible link source d e scriptio nh yperlink , but is not limit e d to, and may further include a 
mismatch in the case where it is difficult to visually recognize the link through the browser^ 
because of th e fact that the hyperlink appearing on the source web page link sourc e description 
hyperlink is described as a transparent image, a consid e rabl e infinitosimally small image or 
character, or an image or character which is the same color as that of a background. Even if it 
is possible to visually see the link sourc e d e scription h yperlink , it is-may be impossible to 
distinguish the link from the body text, as-if the Urie-style of the link source d e scription hyperlink 
hyperlink is the same as that of the body text as w e ll as and there is no highlighted 
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representation. This case, therefore, is included in the phantom link because the link cannot be 
visually confirmed on the display screen of the browser. 
(6) A loop link 

As shown in FIG 8, the loop link means a mismatch in cas e where the audience 
sequentially follows links for a-certain information on e aft e r another, but th e reby resulting in the 
return to the original page. In FIG. 8, the document 161 has a link 261 to the document 162 
with the hyperlink appearing on the source web page link sourc e description hyperlink 
"Information about a present". Furthermor e , th oThe document 162 has a link 262 to the 
document 163 with the link description "Digital camera present". Mor e ov e r Finally , the 
document 163 has a link 263 to the document 161 with the link sourc e d e scription hyperlink 
hyperlink "Click here to a present". When the audience who brows e s browsing the document 
161 may b eis interested in a s e nt e nc e "Information about a present" in the document 161, the 
audience will follow the link 261. However, tho The audience may find that there is also the 
link 262 having the link source description hyperlink hyperlink "Digital camera present" in the 
document 162. Therefore, the audience may expect more information about the present to be 
followed by the next link, and then may access the document 163. However, the document 163 
has alse-the link source d e scription hyperlink hyperlink "Click here to a present". Therefore, 
the audience may intend to acquire desired information and then follow the link 263. After 
aUUltimately, the link 263 will be followed to the original document 161. The audience may 
confuse be confused about where h e /h e r can to find the fight -desired information. Thus, the 
loop link causes a problem that the audience will wander through documents without any desired 
information. 

First preferred embodiment 
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Referring now to FIG 1 of the drawings, there is shown a first preferred embodiment of 
the hypertext checking apparatus according to the present invention. 

Referring now to FIG 1 of the drawings, the first embodiment of the hypertext checking 
apparatus according to the present invention includes a data processing unit 1 operated under 
program control, a storage device 2 capable of storing information, an input unit 3, such as a 
keyboard, and an output device 4, such as a displaying unit, a printer, and so on. 

The data processing unit 1 includes an information collecting unit 11, a candidate 
providing unit 12, a condition detecting unit 13, and a correction reflecting unit 14. 

The storage device 2 includes a hypertext database 21 and an information storing unit 

22. 

The information collecting unit 1 1 is designed to fetch documents from the hypertext 
database 21 included in the storage device 2, to retrieve link information, and to store the link 
information in the information storing unit 22. In this embodiment, the link information may 
include some items such as an address of the ttftlesource web p age, an address of the Uftk-target 
web p age, a link sourc e description h yperlink , a target attribute, a style attribute, and so on. The 
information storing unit 22 may record thereon a body of the document, an updated date, a date 
and time of acquisition, and a condition when the document is acquired, such as an error or 
success, in addition to the link information. 

The condition detecting unit 13 is designed to group the links stored in the information 
storing unit 22 in accordance with the link information, and to extract a particular link among the 
links grouped in a same group as a mismatched link, from the information storing unit 22. 

The candidate providing unit 12 is designed to provide a correction candidate 
corresponding to the link which is extracted as the mismatched link by the condition detecting 
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unit 13. In this embodiment, the correction candidate includes information about: which of the 
items of the link information of the mismatched link should be corrected, and how to be 
corrected. The candidate providing unit 12 outputs the correction candidate to the correction 
reflecting unit 14. 

The correction reflecting unit 14 is designed to allow the administrator to confirm the 
outputted mismatched link and the correction candidate so as to reflect the confirmed result to 
the hypertext database 21 . 

The hypertext database 21 is capable of storing therein a set of hypertexts included in 
targeted sites to be inspected. The local storage device 2 does not need to include the entire 
hypertext database 21, and some parts of the hypertext database 21 may be distributed among a 
network, like that a group of hypertexts are distributed among an Internet. 

The information storing unit 22 is capable of storing therein an information about links 
included in each documents in the hypertext database 21. Fig. 9 shows an example of the link 
information. For example, the link information included in the document 101 shown in FIGS. 
2A and 2B is illustrated in FIG. 9. It will be understood from FIG 9 that the document 101 has: 
a link 201 which is linked to the document 102 by way of a link sourc e d e scription hyperlink 
hyperlink "GX0011": a target attribute of which is designated by "_blank"; and a style attribute 
of which is designated by "stOl ". Although the link source d e scription hyperlink hyperlink is 
described as a text format in this embodiment, the link sourc e d e scription hyperlink hyperlink 
may be designated by an address of the specified image file when the link sourc e d e scription 
hyperlink hyperlink is specified as an image. Furthermore, there may be provided a character 
recognition module. The character recognition module may be executed upon the image file so 
as to extract a text embedded in the image and to store the extracted text in the information 
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storing unit 22. 

The operation of the hypertext checking apparatus of the first embodiment will be 
described in the followings with reference to FIGS. 1, and 9 to 13. 

Firstly, the information collecting unit 1 1 is operated to read out the document from the 
hypertext database 21 based on the collection condition setting inputted by the input unit 3 (the 
step SI in FIG. 10). In this embodiment, the document may be accessed by way of a HTTP 
(Hyper Text Transfer Protocol) when the hypertext database 21 is WWW (World Wide Web). 
Conventionally, such function has been implemented with a Web browser, such as an IE (Internet 
Explorer produced by Microsoft Corporation) or Web search engines of a robot type, so-called a 
crawler or a spider. 

There is shown in FIG. 11 a display screen of a setting for the collection when the 
hypertext database 21 is WWW. As shown in FIG 11, this display screen is designed to allow 
the user to specify: a domain name of the site for an analysis target; a target number of pages for 
documents to be collected; a file extension of the target document; a time interval between 
accesses to the server; a retry count for failure in collection; a timeout duration for the collection; 
and a depth of a hierarchy of the recursion when the information are recursively collected by 
following links. In FIG. 11, the display screen further includes an execute button which is 
operated to initiate the collection of the hypertexts. 

Next, the HTML descriptions of the collected documents are analyzed by the 
information collecting unit 11, so that the link information are extracted as shown in FIG. 9 and 
then stored in the information storing unit 22 (the step S2 in FIG 10). 

The condition detecting unit 13 is then operated to extract the link which fulfills the 
extraction condition as the mismatched link from the information storing unit 22 based on the 
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extraction conditions inputted by the input unit 3 (the step S3 in FIG 10). 

There is shown in FIG 12 a display screen of a setting for the extraction conditions. 
As shown in FIG 12, the display screen is designed to allow the user to specify which kinds of 
mismatched links, such as a dead link, i.e., a physical mismatched link, an error link, a link for 
expired information, disunity inconsistency in link sourc e d e scriptions hyperlinks , inconsistency 
disunity in the styles of link source descriptions hyperlinks , a phantom link, and a loop link, is to 
be extracted. When the link for a particular address is already proved as the mismatched link, 
this address can be inputted to a "particular URL" column as shown in FIG 12, so that the link 
including the link target having the inputted address can also be extracted. When too many 
mismatched links are extracted, the number of records of mismatched links can be specified by 
limiting the number of records to be displayed on a display screen. There is also provided an 
execute button for allowing the user to issue instruction to start the extraction of the mismatched 
link. 

The extraction of the dead link among some kinds of the mismatched links can be 
realized by the aforesaid conventional method, thereby omitting the descriptions in this 
embodiment. The method of extracting the link having a particular URL for a link source is 
obvious to those skilled in the art, thereby also omitting the descriptions in this embodiment. 
The description of the method of extracting remaining logically mismatched links will be 
described in the followings. 

The candidate providing unit 12 is then operated to provide a correction candidate so as 
to eliminate the mismatch in the link extracted as the mismatched link by the condition detecting 
unit 13 (the step S4 in FIG 10), and to output a list of the results on a display screen (the step S5 
in FIG 10). 
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There is shown in FIG 13 an example of the display screen of the list of the results of 

extracted mismatched link. The list of the results has a plurality of items such as kinds of 

mismatched links, a correction candidate, a link ID, a link-source web page , a link-target web 

page , a link sourc e d e scriptio n hyperlink , a target attribute, and a style attribute. As shown in 

FIG. 13, the links are divided into groups such that the links having the same "link-targe t web 

page " and " link sourc e description h yperlink " are grouped in a same group. The grouped links 

are respectively given kinds of mismatched link and correction candidates and then displayed on 

the display screen. 

When the link source address or the link target address is clicked, the corresponding 
document can be accessed. The correction candidate outputted by the system is indicated in the 
column of the "correction candidate". The column of the "correction candidate" has two 
sections divided by a colon ":", one of which includes items of the link information to be 
corrected and the other of which includes information about how to correct. For example, the 
representation "link: delete" means that the link should be deleted. The representation "link 
sourc e d e scriptio n hyperlink : "What's New"" means that the link source d e scription hyperlink 
should be changed to "What's New". This correction candidate may be re-written by the 
administrator after confirming. 

The administrator can then confirm the mismatched link and the correction candidate 
outputted on the list (the step S6 in FIG 10). Referring to FIG 13, the links having the same 
linl^target web page and link sourc e d e scription hyperlink are grouped. Therefore, once the 
administrator confirms a representative example of each of the mismatched links, the 
administrator does not need to confirm all of the links. For example, it is understood from the 
list of the results shown in FIG 13 that all of the links having the link IDs 271 to 274 have the 
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same linl^target web page indicative of the document 175, the same link source d e scription 
hyperlink indicative of "ox campaign now underway", the kind of mismatched link indicative of 
the link for the expired information, and the correction candidate indicative of "link: delete". 
Therefore, it is understood that all of the links of the link IDs 271 to 274 should be deleted. All 
the administrator has to do is to access the document 171 to confirm the validity of the 
mismatched link and correction candidate of the link 271. The administrator dose not have to 
confirm all of the remaining links 272 to 274. Therefore, it is possible to cut a cost of the 
confirmation. 

When there are a plurality of correction candidates, the administrator may be provided 
with a plurality of correction candidates, such as "link target: document 177 OR link source 
descriptio n hyperlink : product B" in FIG 13, which are partitioned by "OR". In this case, the 
administrator may select a necessary correction candidate based on the result of the confirmation. 
When the administrator judges that the correction candidate is wrong in accordance with the 
result of the confirmation, the administrator may correct this error. For example, the correction 
candidate of the links 278 and 279 are indicative of " link sourc e descriptio nh yperlink : What's 
New" in FIG 13. The correction candidate can be changed to "tiri^target web page : document 
180", if the administrator considers that it is appropriate that the fed^target web page address 
should be changed to the document 180. When the administrator judges that the correction 
should not be done, the column of the correction candidate may be brought into a blank, thereby 
making it possible to cancel the correction in the following step. 

When the administrator operates the button of "reflect correction" shown in FIG 13, the 
correction reflecting unit 14 is operated to correct each of the documents in the hypertext 
database 21 in accordance with the correction candidates confirmed by the administrator (the 
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step S7 in FIG 10). When there are a plurality of correction candidates which are still 
connected with each other by "OR" at this stage, only the first correction candidate may be 
reflected. 

The display screen of the list of the results further includes links "sort" at the items of 
the fo^source web page , the link-targe t web page , and the link source description h yperlink , as 
shown in FIG 13. These links are adapted to sort records of the result of extraction by using 
each item as the sort key. For example, in response to a click of the link "sort" of the item "link 
source", the records of the result of extraction can be sorted by the link source document. 
Therefore, it is possible to grasp a tendency for each kind of the mismatched links to occur, for 
this reason, it is usable to correct the mismatched link by hands. In response to a click of the 
link "sort" of the item "link target", the records of the result of extraction can be sorted by the 
link target document. Therefore, it is possible to grasp a situation in occurrence of the 
mismatched link in a particular document, for this reason, the mismatched link caused to an 
important document, such as a document inundated with accesses, can be investigated. In 
response to a click of the link "sort" of the item " link sourc e d e scriptio nh yperlink ", the records 
of the result of extraction can be sorted by the link sourc e d e scription h yperlink . Therefore, it is 
possible to grasp a tendency for each kind of the link sourc e d e scription hyperlink to cause the 
mismatch, for this reason, the suitability of the expression for the link sourc e d e scription 
hyperlink can be investigated. 

Although it is described in this embodiment that the administrator corrects the link 
source d e scriptio nh yperlink , the Itfdetarget web page , and so on, in the column of the "correction 
candidate" displayed on the display screen of the list of the results in FIG. 13 is described, it is 
not limited to that embodiment. The administrator may directly re-write the records in the 
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columns such as "link source", the "link target", and the " link s ourc e d e scription h yperlink " on 
the display screen. Further, although it is described in this embodiment that the display screen 
of the setting for the collection of the hypertexts and the display screen of the setting for the 
extraction conditions are separately provided, a single display may be provided for setting for the 
collection of the hypertexts and setting for the extraction conditions at the time of starting the 
analysis in another embodiment. In this case, steps SI to S5 shown in FIG 10 may be 
automatically performed. The present invention is not limited to the embodiments described 
above. 

Furthermore, although it is described in this embodiment that the administrator confirms 
the outputted mismatched link and the correction candidate in the step S6, the step S6 may be 
omitted and the rest of the steps, steps SI to S7, may be automatically performed in another 
embodiment. The present invention is not limited to the embodiments described above. 

Furthermore, although it is described in this embodiment that the administrator decides 
the timing to start the analysis, it is not limited to that embodiment. In another embodiment, 
there may be provided a method having the steps of: previously setting the collection and 
extraction conditions; automatically performing the steps SI to S5 at fixed intervals; and 
notifying the administrator of the obtained result by an electronic mail or the like. The present 
invention is not limited to the embodiments described above. 

An embodiment of the detection of the error link 

The operations of the condition detecting unit 13 and the candidate providing unit 12 
will be described in detail in the followings, with reference to FIGS. 3, 14 and 15A to 15D. In 
this embodiment, the information storing unit 22 is capable of storing the link information about 
the group of documents shown in FIG 3. 
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Firstly, the condition detecting unit 13 is operated to read out the link information from 
the information storing unit 22 to divide the links into some groups in accordance with the link 
information. The condition detecting unit 13 divides links having the same link sourc e 
description hyperlink into a same group. Then, the condition detecting unit 13 further divides 
the links which is divided in the same group, having the same link target into a same sub-group. 
Then, the condition detecting unit 13 extracts the links which has the different link target. The 
condition detecting unit 13 is further operated to give an criteria score to each of the links in 
accordance with the number of links included in the sub-group (the step Tl 1 in FIG 14). 

FIG 15A shows an example of the links extracted and the criteria scores given in the 
step Tl 1. It can be understood from FIG 15A that the links 21 1, 212, 213, and 214 are grouped 
as these links have a same link sourc e description hyperlink "GXOOH", while the links 215, 
and 216 are grouped as these links have a same link sourc e description hyperlink "GX0012" 
The three links 211, 212 and 213 in the group having the link sourc e d e scription hyperlink 
"GX001 1" are further sub-grouped as these links have a same link target "document 116", while 
the link 214 is grouped into a sub-group having the link target "document 117". The link 215 in 
the group having the link sourc e description hyperlink "GX0012" is grouped into a sub-group 
having the link target "document 116", while the link 216 is grouped into a sub-group having the 
link target "document 1 1 7". 

The method of giving the criteria score includes the steps of: setting the criteria score 
for each of the groups to "1"; setting the criteria score for each of the sub-groups to a value 
which is obtained by distributing the criteria score into the number in inverse proportion to the 
number of links in the sub-groups, and setting the criteria score for each of the links to a value 
which is obtained by dividing the criteria score of each of the sub-groups equally into the number 
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of the links in the sub-groups. 

For example, as shown in FIG 15 A, the group of the link sourco d e scription hyperlink 

"GX001 1 " is given the criteria score "1". When the criteria score is distributed into the number 

in inverse proportion to the number of the links in the sub-group, the sub-group of the link target 

address "document 116" is given the criteria score "1/4", while the sub-group of the link target 

address "document 117" is given the criteria score "3/4". The criteria score of the sub-group 

"1/4" is divided equally into three links 211, 212, and 213, thereby giving the criteria score for 

each of the links 211,212, and 213 "1/12". Similarly, each of the links 215 and 216 is given the 

criteria score "1/2". 

In the following step T12 in FIG. 14, the condition detecting unit 13 is operated to read 
out the link information from the information storing unit 22 to divide the links into some groups 
in accordance with the link information. The condition detecting unit 13 divides links having 
the same link target into a same group. Then, the condition detecting unit 13 further divides the 
links which is divided in the same group, having the same link sourc e d e scription hyperlink 
into a same sub-group. Then, the condition detecting unit 13 extract the links which has the 
different link source d e scriptio nh yperlink . The condition detecting unit 13 is further operated 
to give an criteria score to each link in accordance with the number of links included in the 
sub-group. 

FIG. 1 5B shows an example of the links extracted and the criteria scores given in the 
step T12. It can be understood from FIG 15B that the links 211, 212, 213, and 215 are grouped 
as these links have a same link target "document 116", while the links 214, and 216 are grouped 
as these links have a same link target "document 117". The three links 21 1, 212 and 213 in the 
group having the link target "document 116" are further sub-grouped as these links have a same 
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link sourc e description hyperlink "GXOOH", while the link 215 is grouped into a sub-group 
having the link source d e scription hyperlink "0X0012". The link 214 in the group having the 
link target "document 117" is grouped into a sub-group having the link source d e scription 
hyperlink "GXOOH". while the link 216 is grouped into a sub-group having the link sourc e 
description hyperlink "GX0012". 

The method of giving the criteria score is the same as the step Til. Thus, in the step 
T12, the criteria score of each of the links 211, 212 and 213 becomes "1/12", the criteria score of 
the link 215 becomes "3/4", and the criteria score of each of the links 214 and 216 becomes 
"1/2". 

In the following step T13 in FIG 14, the condition detecting unit 13 is operated to read 
out the link information from the information storing unit 22 to divide the links into some groups 
in accordance with the link information. The condition detecting unit 1 3 divides links having 
the same link source and link sourc e description hyperlink into a same group. Then, the 
condition detecting unit 13 farther divides the links, which is divided in the same group, having 
the same link target into a same sub-groups. Then, the condition detecting unit 1 3 extracts the 
links which has the different link target. The condition detecting unit 13 is farther operated to 
give an criteria score to each link in accordance with the number of links included in the 
sub-group. 

FIG 15C shows an example of the links extracted and the criteria scores given in the 
step T13. It can be understood from FIG 15C that the links 215 and 216 are grouped in a same 
group as these links have a same link source "document 115" and link source description 
hy perlink "GX0012". The link 215 is farther grouped into a sub-group having the link target 
"document 116", while the link 216 is grouped into a sub-group having the link target "document 
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117". 

The method of giving the criteria score is also the same as the step Til. Thus, in the 
step T13, the criteria score of the links 215 and 216 are "1/2". 

In the following step T14 in FIG 14, the condition detecting unit 13 is operated to read 
the link information from the information storing unit 22 to extract the links the link source 
d e scription hyperlink of which includes words that is not included in the title, the header or the 
highlighted character string in the link target document thereof in accordance with the link 
information. The condition detecting unit 13 gives the criteria score "1" to each of the 
extracted links. 

FIG 15D shows an example of the links extracted and the criteria scores given in the 
step T14. It can be understood from FIG 3 that as for the links 214 and 215 shown in FIG. 15D, 
the words included in the link source d e scription hyperlink are not expressed in the links target 
documents. 

In the following step T15, the condition detecting unit 13 is operated to sum up the 
criteria score of each of the links. Therefore, the criteria score of each of the links 211,212, 
and 213 becomes "1/6" obtained by an equation "1/12+1/12=1/6". The criteria score of the link 

214 becomes "9/4" obtained by an equation "3/4+1/2+1=9/4". The criteria score of the link 

215 becomes "11/4" obtained by an equation "1/2+3/4+1/2+1=11/4". The criteria score of the 
link 216 becomes "3/2" obtained by an equation "1/2+1/2+1/2=3/2". 

In the following step T16 in FIG 14, the condition detecting unit 13 is operated to 
compare the sums of the criteria scores of sub-groups with e ach other , and to then extract the 
links having the higher criteria score as a mismatched link. The candidate providing unit 12 is 
op e rat e d to provid e provides the correction candidate for extracted links under each condition so 
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as to harmonize link information about the link having the higher score with that of the lower 
score in a same group. 

As shown in FIG 15A, in the group of the link sourc e d e scription hyperlink "GX00H", 
the sum of the criteria scores of the sub-group including the links 211, 212 and 213 becomes 
"1/2" obtained by an equation "1/6+1/6+1/6=1/2", and the sum of the criteria scores of the 
sub-group including the link 214 becomes "9/4". Therefore, the link 214 which has the higher 
criteria score is decided as the mismatched link in this case. In order to harmonize the link 
information about the link 214 with that of the sub-group including the links 211, 212 and 213, it 
can be understood that the correction candidate for the link 214 is appropriately obtained as "link 
target: document 1 1 6". 

Furthermore, in the group of the link sourc e d e scription hyperlink "GX0012" in FIG 
15A, the sum of the criteria scores of the sub-group including the link 215 becomes "11/4", and 
the sum of the criteria scores of the sub-group including the link 216 becomes "3/2". Therefore, 
the link 215 is decided as the mismatched link in this case. In order to harmonize the link 
information about the link 215 with that of the sub-group including the link 216, it can be 
understood that the correction candidate for the link 215 is appropriately obtained as "link target: 
document 1 1 7". By the same token, in FIG 1 5B, the link 21 5 is decided as the mismatched link, 
and the correction candidate thereof is decided as " link s ourc e d e scription h yperlink : "GX0012". 
By the same token, in FIG 15C, the link 215 is decided as the mismatched link, and the 
correction candidate thereof is decided as "Uftl^target web page : document 117". It is 
understood from the above results that the mismatched links are the links 214 and 215, and the 
correction candidates of the links 214 and 215 are "link target: document 116" OR " link sourc e 
d e scription h yperlink : GX0012", and "tirietarget web page : document 117" OR "the link sourc e 
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d e scription h yperlink : GX0011", respectively. 

Although it is described in this embodiment that the link having the higher sum of the 

criteria score is decided as the mismatched link, it is not limited to that example. In another 

embodiment, there is provided a method of deciding the mismatched link having the steps of: 

setting a predetermined threshold for the criteria score; and deciding the link as the mismatched 

link only when the criteria score thereof is higher than the threshold even if the criteria score 

thereof is higher than those of others. The present invention is not limited to the embodiments 

as described above. 

Furthermore, although it is described in this embodiment that the criteria score is 
calculated, for example, based on the number of the links in each of the sub-groups, but it is not 
limited to that example. The criteria score may be simply the number of extractions. In 
another embodiment, there may be provided a method of calculating the criteria score having the 
steps of: specifying a characteristic vector of the link as the number of links in the sub-group; 
preparing a characteristic vector of the mismatched link as a teaching data; and calculating a 
mean of distance between the characteristic vector of the link and the characteristic vector of the 
mismatched link to obtain the criteria score. The present invention is not limited to the 
embodiments described above. 

Furthermore, although it is described in this embodiment that the extraction conditions 
of the error link are calculated by summing up the criteria scores including: (1) a first criteria 
score calculated by comparing the link sourc e d e scriptionG hyperlinks of the plural links for the 
same iiftk-target web page with e ach other ; (2) a second criteria score calculated by comparing 
the target web pages of a plurality of links represented by the same link sourc e description 
hyperlin k with e ach oth e r ; (3) a third criteria score calculated by comparing the feJetarget web 
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pages based on a plurality of links for the same link source page and the same link sourco 
d e scription hyperlink with each othor : and (4) a fourth criteria score calculated by comparing the 
link sourc e d e scription hyperlink and the Hnl^target web p age in the contents, but it is not limited 
to that example. In another embodiment, the criteria score may be calculated according to at 
least one of the above criteria scores, or according to the weighted criteria scores based on each 
of conditions. The present invention is not limited to the above embodiments of the method. 
An embodiment of the detection of the expired link 

The operations of the condition detecting unit 13 and the candidate providing unit 12 in 
the detection of the expired link will be described in detail in the followings with reference to 
FIGS. 4 and 16 of the drawings. 

Firstly, the condition detecting unit 13 is operated to extract links including dated 
expressions in the link sourc e description hyperlink thereof, or indicating documents including 
dated expressions. Then, the condition detecting unit 13 is op e rated to calculat e calculates the 
expiration date of the dated expression related to the extracted link, and to judge whether the 
present date and time is prior to the expiration date or not (the step T21 in FIG 16). 

In the following step T22 in FIG 16, the condition detecting unit 13 is operated to 
extract the expired expression from the link target document related to the extracted link. In 
this embodiment, the expired expression means an expression more commonly used for a notice 
sentence when the service is terminated, closed, or moved, such as "Closed.", "Moved.", 
"Ended.", "Automatically jump after a few seconds.", "effective in [date]", "We appreciated 
your past patronage.", "We appreciated your past participation.", and so on. Besides the above 
expired expression, if the description in the HTML is indicated that the document can be 
automatically jumped after a few seconds, this is extracted as the expired expression. 
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In the following step T23 in FIG 16, the condition detecting unit 13 is op e rated to 
calculat e calculates criteria score of the link by integrating the result judged whether the present 
date and time is included in the expiration date or not in the step T21, and the number of the 
expired expression extracted in the step T22. When this criteria score is higher or equal to a 
predetermined threshold, the link having the criteria score is outputted as the mismatched link. 

There may be provided an example of the method of calculating the criteria score of the 
link including the step of multiplying the number of dates obtained as the expired date and the 
number of appearances of the extracted expired expressions together. As for another 
embodiment, there may be provided a method of calculating the criteria score including the steps 
of: specifying a characteristic vector of the link based on the number of dates obtained as the 
expired date and the number of appearances of the extracted expired expressions; calculating a 
mean value of distances between the specified characteristic vector of the link and characteristic 
vectors of the mismatched link prepared as teaching data; and setting the mean value as the 
criteria score. The present invention is not limited to the embodiments described above. 

In the following step T24, the candidate providing unit 12 is operated to extract the 
moved new address for the link outputted as the mismatched link from the link target document 
to specify the new address as the correction candidate. In this embodiment, the new address 
means an address to which the document can be automatically jumped in accordance with the 
HTML. Instead of the automatic jump of the document, the expression "Click here", or "Move 
to the following URL." may be extracted. Then, the target address of a link included in the 
expression or written in peripheral of the expression may be specified to be the correction 
candidate as the new address. When, on the other hand, the new address cannot be extracted, 
the correction candidate may be outputted as "link: delete". 
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An example of the operations of the condition detecting unit 13 and the candidate 
providing unit 12 will be described in the followings with reference to FIG 4A. Here, the 
method of calculating the criteria score of the link including the step of multiplying the number 
of dates obtained as the expired date and the number of appearances of the extracted expired 
expressions together, as described above, is used. 

Referring also to the step T21 of FIG 16, as the document 125 includes the dated 
expression such as "Jul. 20 th , 2002 to Aug. 31 st , 2002.", the condition detecting unit 13 is 
operated to extract the links 211, 222, 223, and 224. Assuming that the present date is Aug. 1 5 th , 
2002, the condition detecting unit 13 judges that the present date is prior to the expiration date of 
the document 125, thereby judging the links 211, 222, 223, and 224 are not expired. 

In the next step T22 of FIG 16, nothing is extracted, as the document 125 does not 
include expired expression. 

With the result obtained in the step T21 that the present date is prior to the expiration 
date, and the result obtained in the step T22 that no expressions expressing the expired date are 
extracted, both of the number of dates obtained as the expired date and the number of appearance 
of the extracted expired expression are calculated to be "0", Therefore, the criteria scores of the 
links 211, 222, 223, and 224 become "0" obtained by an equation "0*0=0" Therefore, it is 

judged that all of the links 221, 222, 223, and 224 are appropriate or suitable in the next step T23 
of FIG 16. 

Another example of the operations of the condition detecting unit 13 and the candidate 
providing unit 12 will be described in the followings with reference to FIG. 4B. 

Referring also to the step T21 of FIG 16, as the document 125 includes the dated 
expression such as "Jul. 20 th , 2002 to Aug. 31 st , 2002.", the condition detecting unit 13 is 
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operated to extract the link 224. Assuming that the present date is Sep. 1 5 th , 2002, the condition 
detecting unit 13 judges that the present date is over the is over the expiration date of the 
document 125, thereby judging the link 244 is expired. 

In the next step T22 of FIG. 16, the condition detecting unit 13 is operated to extract the 
expired expression such as "Closed.". 

With the result obtained in the step T21 that the present date is over the expiration date, 
and the result obtained in the step T22 that the expired expression such as "Closed." is extracted, 
the number of dates obtained as the expired date is calculated to be "15", and the number of 
appearance of the extracted expired expression is calculated to be "1". This leads to the fact 
that the criteria score of the link 224 is "15" obtained by an equation "15x1 = 15". Therefore, 
when the threshold is set as "10", it is judged that the link 224 is the mismatched link. 

In the next step T24 of FIG. 16, the candidate providing unit 12 is operated to extract the 
new address. However, as the document 125, shown in FIG 4B, does not include 
corresponding address, the candidate providing unit 12 cannot obtain the new address. 
Therefore, the candidate providing unit 12 outputs the "link: delete" as the correction candidate 
of the link 224. 

Although it is described in this embodiment that the expired link is detected by the dated 
expression and the expired expression, but is not limited to this method. For example, the 
detecting method, similar to the detection of the error link as described above, includes the steps 
of: grouping the links having a same ttftletarget web p ages; and detecting sub-groups having the 
different link source d e scription hyperlink in the same group. Furthermore, in another 
embodiment, the detecting method may include the steps of: grouping the links having a same 
link sourco d e scriptio n hyperlink ; and detecting the sub-groups having the different link target in 
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the same group. 

An embodiment of the detection of the di s unity inconsistency in the link s ource 

dcscription s hyperlinks 

The operations of the condition detecting unit 13 and the candidate providing unit 12 for 
the detection of the disunity inconsistency in the link sourc e d e scriptions hyperlinks will be 
described in detail in the followings, with reference to FIGS. 5, 17 and 18 of the drawings. 

Firstly, the condition detecting unit 13 is operated to read out the link information from 
the information storing unit 22 to divide the links into some groups in accordance with the link 
information. The condition detecting unit 13 divides links having the same link target into a 
same group. Then, the condition detecting unit 13 further divides the links which is divided in 
the same group, having the same link source d e scription h yperlink into a same sub-group. Then, 
the condition detecting unit 13 extracts the links which has the different link sourc e 
d e scriptio n hyperlink . The condition detecting unit 13 is further operated to give an criteria 
score to each link in accordance with the number of links included in the sub-group, in the step 
T31 in FIG 17. 

FIG. 1 8 shows an example of the link extracted and the criteria score given in the step 
T31, when the relationship between documents is as shown in FIG 5. It can be understood 
from the description of FIG 18 that the links 231, 232, 233, and 234 are grouped as these links 
have a same link target "document 135" The three links 231, 232, and 233 are further grouped 
into a sub-group of the same link sourc e description hyperlink "GX Series", while the link 234 is 
grouped into a sub-group of the link sourc e d e scription hyperlink "gX Series". 

The method of giving the criteria score includes the steps of: setting the criteria score 
for each of the groups to "1"; setting the criteria score for each of the sub-groups to a value 
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which is obtained by distributing the criteria score into the number in inverse proportion to the 
number of links in the sub-groups, and setting the criteria score for each of the links to a value 
which is obtained by dividing the criteria score of each of the sub-groups equally into the number 
of the links in the sub-groups. Therefore, the criteria score of each of the links 231, 232, and 
233, given in the step T31 of FIG 17, becomes "1/12" while the criteria score of the link 234, 
also given in the step T31 of FIG 17, becomes "3/4", as shown in FIG 18. 

The condition detecting unit 13 is then operated to compare the sums of the criteria 
scores with e ach oth e r of sub-groups, and to then extract the links having the higher criteria score 
as a mismatched link. In FIG. 18, the criteria score of the link 234 "3/4" is the higher than the 
sum of the criteria scores of the links 231, 232 and 233 "1/4". Therefore, the link 234 is 
extracted as the mismatched link. 

In the following step T32 in FIG 17, the candidate providing unit 12 is operated to 
investigate whether the link sourc e description hyperlink of the extracted links is registered in a 
glossary or not. In this embodiment, the glossary means a table having expressions to be 
unified with a key of fluctuation of description for a word. For example, a word "free 
software" means a software available without admission, and has a plurality of expression 
fluctuation of description, such as "free ware", and "free soft". When the administrator can 
unify these words into a word "free software", the words "free ware", and "free soft" are 
assumed to be the key, and the word "free software" is assumed to be a value. These words 
may be registered in the glossary. 

When the link sourc e d e scription hyperlink of the extracted link is already registered in 
the glossary, YES of the step T32 in FIG 17, the candidate providing unit 12 is operated to 
output the correction candidate as the unified expression corresponding to the key, in the step 
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T33 in FIG 17. In order to fully absorb fluctuations of descriptions, fuzzy search may be 
performed when the key is searched. In another embodiment, the method of calculating the 
correction candidate may include the steps of: conducting fuzzy search for the unified expression 
without the words of the fluctuation of description; judging whether affinity level in character 
string is the higher or equal to a threshold or not; and assuming the correction candidate as the 
searched unified expression when the judgment is made that the affinity level in character string 
is the higher or equal to the threshold. 

When, on the other hand, the link source description hyperlink of the extracted link is 
not registered in the glossary, NO of the step T32 in FIG 17, the candidate providing unit 12 is 
op e rat e d to provid e provides the correction candidate to harmonize the link sourc e d e scription 
hyperlink having the higher criteria score with that of the lower criteria score in the same group, 
in the step T34 in FIG. 17. In the case shown FIG 18, the candidate providing unit 12 outputs 
" link sourc e description h yperlink : GX Series" as the correction candidate. 

It is assumed that both of the words "GX Series", and "gX Series", shown in FIG 18, 
are not registered in the glossary. 

Although it is described in this embodiment that the criteria score is calculated, for 
example, based on the number of the links in each of the sub-groups, the present invention is not 
limited to the embodiments described above. In another embodiment, there is provided a 
method of calculating the criteria score having the steps of: specifying a characteristic vector of 
the link based on the number of links included in the sub-group; calculating a mean value of 
distances between the specified characteristic vector of the link and characteristic vectors of the 
mismatched link prepared as teaching data; and setting the mean value as the criteria score. 
The present invention is not limited to the embodiments described above. 
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An embodiment of the detection of the di s unity inconsistency in the styles of the link source 

dc s criptions hyperlinks 

The operations of the condition detecting unit 13 and the candidate providing unit 12 for 
the detection of the disunity in the style of the link source d e scription hyperlink will be described 
in detail in the followings, with reference to FIGS. 6, 19 and 20 of the drawings. 

Firstly, the condition detecting unit 1 3 is operated to read the link information from the 
information storing unit 22 to divide the links into some groups in accordance with the link 
information. The condition detecting unit 13 divides links having the same link source 
document into a same group. Then, the condition detecting unit 13 further divide the links 
which is divided in the same group, having the same target attribute into a same sub-group. 
Then, the condition detecting unit 13 extracts the links which has the different target attribute. 
The condition detecting unit 13 is further operated to give an criteria score to each link in 
accordance with the number of links included in the sub-group, in the step T41 in FIG 19. 

FIG 20 shows an example of the links extracted and the criteria scores given in the step 
T41 in case where the relation between the documents is as shown in FIG 6. It can be 
understood from FIG 20 that the links 241, 242, 243, and 244 are grouped as these links have a 
same link source "document 141". The three links 241 , 242, and 243 are further grouped into a 
sub-group of the same target attribute "J)lank", while the link 244 is grouped into a sub-group of 
the target attribute "not specified". 

The method of giving the criteria score includes the steps of: setting the criteria score 
for one of the groups to "1"; setting the criteria score for each of the sub-groups to a value which 
is obtained by distributing the criteria score into the number in inverse proportion to the number 
of links in the sub-groups, and setting the criteria score for each of the links to a value which is 
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obtained by dividing the criteria score of each of the sub-groups equally into the number of the 
links in the sub-groups. Therefore, as shown in FIG 20, in the step T41, the criteria score of 
each of the links 241 , 242, and 243 becomes "1/12", while criteria score of the link 244 becomes 

"3/4". 

The condition detecting unit 13 is then operated to compare the sums of the criteria 
scores with each other of sub-groups, and to then extract the links having the higher criteria score 
as a mismatched link. In FIG 20 the criteria score of the link 244 "3/4" is the higher than the 
sum of the criteria scores of the links 241, 242 and 243 "1/4". Therefore, the link 244 is 
extracted as the mismatched link. 

In the following step T42 in FIG 19, the candidate providing unit 12 is operated to 
frrevkte provides the correction candidate to harmonize the target attribute having the higher 
criteria score with that of the lower criteria score in the same group. In the case shown in FIG 
20, the candidate providing unit 12 outputs "target attribute: _blank" as the correction candidate. 

Although it is described in this embodiment that the targets to be grouped in the step 
T41 of FIG 19 are the links having the same link source document, but the present invention is 
not limited to this embodiment. In another embodiment, there may be provided a method 
including the step of grouping the links having a same link sourc e d e scription hyperlink and 
included in a particular area, such as a table, and a list of links into a same group. In another 
embodiment, there may be provided a method including the steps of: grouping the links among a 
plurality of documents, such as a particular document and the document stored in a same 
directory as the particular document, based on the style; and detecting the disunity in the link 
style of the page peripheral to the particular document. 

In this embodiment, the method of detecting the disunity in the target attribute and 
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calculating the correction candidate have been described above, the similar method of detecting 
disunity in style attributes and calculating the correction candidate may be provided. 

In this embodiment, the criteria score is calculated, for example, based on the number of 
the links in each of the sub-groups. The present invention is not limited to this embodiment. 
In another embodiment, there is provided a method of calculating the criteria score having the 
steps of: specifying a characteristic vector of the link as the number of links in the sub-group; 
preparing a characteristic vector of the mismatched link as a teaching data; and calculating a 
mean of distance between the characteristic vector of the link and the characteristic vector of the 
mismatched link to obtain the criteria score. 

An embodiment of the detection of the phantom link 

The operations of the condition detecting unit 13 and the candidate providing unit 12 in 
the detection of the phantom link will be described in detail in the followings with reference to 
FIGS. 7 and 21 of the drawings. 

Firstly, the condition detecting unit 13 is operated to read out the link information from 
the information storing unit 22, according to the link information, to extract the link having an 
invisible link source description h yperlink , in the step T51 in FIG 21. In this embodiment, the 
invisible link sourc e description hyperlink means a null character string, a transparent image, a 
considerable infinitesimally small image or character, or an image or character which is the same 
color as that of a background. In FIG 7A, the link having a link sourc e d e scription h yperlink 
specifying a null character string is extracted. 

In the following step T52 in FIG. 21, the candidate providing unit 12 is operated to 
output the correction candidate so as to delete the link as "link: delete". 

An embodiment of the detection of the loop link 
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The operations of the condition detecting unit 13 and the candidate providing unit 12 for 
the detection of the loop link or looped link will be described in detail in the followings, with 
reference to FIGS. 8 and 22 of the drawings. 

Firstly, the condition detecting unit 13 is operated to read out the link information from 
the information storing unit 22, to separate the link sourco d e scription hyperlink of the link read 
from the information storing unit 22 into words, in the step T61 in FIG. 22. The method of 
separating the link sourc e d e scription hyperlink into words may be performed by conducting a 
morphological analysis, separating the link sourc e description hyperlink at the change of sorts of 
characters, or separating the link sourc e description hyperlink at every several letters. 

In the following step T62 in FIG 22, the condition detecting unit 13 is operated to 
extract a group of links forming a loop and identical in the words in the link sourc e d e scription 
hyperlink corresponding to the loop link. In FIG. 8, all of the links 261, 262 and 263 including 
a word "present" form a loop, and therefore are assumed to be a loop link to be outputted. 

Although it is described in this embodiment the method of extracting the loop links in 
which all of the link sourc e d e scription hyperlink includes the same word, the present invention 
is not limited to this embodiment. In another embodiment, there may be provided a method 
including the steps of: preparing a dictionary including characteristic words classified under each 
of the specific topics; and extracting the loop links by judging whether each of the link sourc e 
d e scriptions hyperlinks includes the characteristic words classified for the same topic. The 
present invention is not limited to the embodiments described above. 

A method of detecting mismatched link focused on a change with time 

Although it is described in this embodiment the method of detecting some kinds of the 
mismatched links based on the link information of each of the links collected at a same time, the 



52 



SUBSTITUTE SPECIFICATION (Marked-up version) Attorney Docket no. Q77945 

U.S. Application No. 10/685,456 

present invention is not limited to this embodiment. In another embodiment, there may be 
provided the method of detecting all kinds of mismatched links including the steps of: repeating 
the collection of the link information periodically; and detecting all kinds of mismatched links by 
focusing on a change in the link information in accordance with time. The operations of the 
condition detecting unit 13 and the candidate providing unit 12 in method of detecting 
mismatched link focused on a change in accordance with time will be described in the followings 
with reference to FIGS. 1,4,23 and 24 of the drawings. 

The information storing unit 22, shown in FIG 1, is adapted to store therein the link 
information at times T and T\ 

Firstly, referring to T71 in FIG 23, the condition detecting unit 13 is operated to group 
the links which are the same in at least one item of the link information at times T and T\ FIG 
24 shows an example of the links grouped into a group of the link target "document 125" in 
accordance with the link information at times on Aug. 15 th , 2002, and on Sep. 15 th , 2002, when 
the relationship of the documents are as shown in FIG 4. 

In the following step T72 in FIG 23, the link having many links varied in the link 
information is extracted from the same group as the mismatched link. In case of FIG 24, there 
are four links of the link target "document 125" at a time on Aug. 15 th , 2002, but there is only 
one link of the link target "document 125" at a time on Sep. 15 th , 2002. Therefore, the link 224 
is extracted as the mismatched link. 

In the following step T72 in FIG 23, the candidate providing unit 12 is operat e d to 
Pfevkte provides the correction candidate to compensate the change caused between the times T 
and T\ Referring to FIG 23, because the deletion of the links are caused to the rest of the links 
221, 222, and 223, between on Aug. 15 th , 2002 and on Sep. 15 th , 2002, therefore, the candidate 
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providing unit 12 provides "link: delete" as the correction candidate. 

As described above, in this embodiment, the links having the same link target document 
at times T and T are respectively grouped as a same group, and when there is a change among 
some of the links included in the same group between the times T and T\ the rest of the link(s) in 
the group is(are) extracted as the mismatched link. Although it is described in this embodiment 
that the change is that some of the links are deleted, it is not limited to that example. For 
example, when there is a change in the link target document for some of the links, the candidate 
providing unit 12 may provide a correction candidate that indicates the user to correct the link 
sourc e d e scriptio nh yperlink . 

Although it is described in this embodiment that the links having the same link target 
document at times T and T' are respectively grouped as a same group, the present invention is 
not limited to this embodiment. In another embodiment, there may be provided a method 
including the steps of: grouping links having a same link source d e scription hyperlink as a same 
group; and detecting a change in the style or target attribute. 

The effect of this embodiment will be described in the followings. 

In this embodiment, all kinds of logical mismatches can be detected. More specifically, 
in this embodiment, a kind of the detectable logical mismatches may include: (1) putting a link to 
a wrong destination or target; (2) putting a link to the expired information; (3) disunity 
inconsistency in the link source d e scriptions hyperlinks ; and (4) inconsistency disunity in the 
styles of the link sourc e d e scriptions hyperlinks , as the mismatched link detecting method 
includes the steps of: extracting the link information from the hypertext database; grouping the 
links of each item of the link information; and detecting the particular link excluded from the 
group to consider it as a mismatched link. The logically mismatches, such as (2) the link for the 
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expired information, may be detected by repeating the collection of the link information 
periodically, and focusing on a change in the link information in accordance with time. 

Furthermore, (5) the phantom link for one example of the logically mismatches may be 
detected by detecting the link having no link source d e scriptio nh yperlink , and (6) the loop link 
for another example of the logically logical mismatches may be detected by detecting the links 
included in a group of links forming a loop and having the link sourc e desoriptions hyperlinks 
corresponding the group of links relevant to a topic. 

In this embodiment, the correction candidate of the logically mismatch can be provided 
for the administrator. More specifically, the candidate correcting method may include a process 
of automatically calculating the correction candidate so as to harmonize the link information of 
the particular link excluded from the group with the link information of the rest of the links in the 
group. Therefore, it is unnecessary for the administrator to consider how to correct the 
mismatched links, and further it is possible to automatically reflecting the correction. 

Furthermore, the grouped mismatched links can be collectively displayed on a display 
screen in this embodiment. Therefore, all the administrator has to do is to confirm a part of 
links, thereby making it possible to judge whether the remaining links are mismatched or not. 
Therefore, the efficiency of check by the administrator can be considerably enhanced. 

In this embodiment, there may be provided a display screen displayed thereon a list 
sorted by each of three items including: (1) a link sourc e d e scription h yperlink ; (2) identification 
information about a link-source web page; and (3) identification information about a fed^target 
web p age. Therefore, the administrator can grasp the correction item every pages, intensively 
examine a mismatch to a key page, and examine suitability of the expression which is used for 
the link sourc e d e ocriptio n hvperlink . 
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In this embodiment, the data processing unit 1 includes the information collecting unit 
1 1 , but this information collecting unit 1 1 may be omitted from the data processing unit 1 , as the 
collection and storage of information about a page and link from the hypertext database 21 which 
is performed by the information collecting unit 1 1 in this embodiment, may be performed by 
another data processing unit, not shown. 

Furthermore, the correction reflecting unit 14 in this embodiment may be omitted from 
the data processing unit 1, when the administrator can correct the mismatched parts in the 
hypertext database 21 by his hand while viewing a display screen of a list of the results shown in 
FIG. 13. Even if there are no information about a kind of mismatched link or the correction 
candidate, the administrator can derive a correction candidate from information, except the kind 
of mismatched link or the correction candidate, as shown on the display screen in FIG 13. 
Therefore, the candidate providing unit 12 in this embodiment may be omitted from the data 
processing unit 1 . 

Second preferred embodiment 

Referring now to FIG 25 of the drawings, there is shown a second preferred 
embodiment of the hypertext checking apparatus according to the present invention. 

As shown in FIG 25, the data processing unit 5 includes: the same constitutional 
elements as those of the data processing unit 1 shown in FIG 1 in the first embodiment. In 
addition, the data processing unit 5 of this embodiment includes an importance calculating unit 
15. 

The importance calculating unit 15 is adapted to calculate an importance value for the 
mismatched link extracted by the condition detecting unit 13 in accordance with an access 
frequency to the document in the detected mismatched link, or a seriousness of mismatched link, 
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and to output the calculated importance value with ranks. 

The operation of the data processing unit 5 in this embodiment will be described in the 
followings with reference to the drawings. 

The operations of the information collecting unit 11 and the condition detecting unit 13 
of this embodiment, shown in the steps SI to S3 in FIG 26, are same as those of the information 
collecting unit 1 1 and the condition detecting unit 13 of the first embodiment shown in FIG 10, 
thereby the description to these steps is omitted. Then, in the step S4, the candidate providing 
unit 12 is operat e d to provido provides a correction candidate so as to eliminate the mismatch in 
the link extracted by the condition detecting unit 13 as the mismatched link, which is the same as 
the step S4 of the first embodiment shown in FIG 10. Then, instead of the step S5 of the first 
embodiment shown in FIG. 10, a-control is passed to the importance calculating unit 15 for 
having the importance calculating unit 15 calculate the importance value for the mismatched link, 
shown as step S8 in FIG 26. 

The importance calculating unit 15 is op e rated to calculato calculates the importance 
value of the link extracted as the mismatched link by the condition detecting unit 13, and to 
output the calculated importance value as a ranking list, shown as the steps S8 and S9 in FIG 26. 
In this embodiment, the importance value may be calculated based on at least a factor or a 
combination of a plurality of factors including: (1) a sort of errors and unsuitability of the 
detected parts; (2) accuracy of errors and unsuitability of the detected parts; (3) the number of 
targeted links of the page including the detected parts; (4) record for frequency of access by user 
to the page including the detected parts; and (5) a stratification level in the hypertext of the page 
including the detected parts. 

Referring to FIG 27 of the drawings, there is shown a display screen including the 
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ranking list of the outputted mismatched link. The ranking list of the display screen shown in 
FIG 27 includes "importance value" in addition to the "kinds of mismatch" and the "correction 
candidate" and so on which are also included in the list in FIG 13. More specifically, this 
importance value of the mismatched link is obtained by grouping the links having the same link 
targets and the same link sourc e d e scription s hyperlinks as a same group, and calculating the 
importance value of the mismatched links for each of the groups, in addition to the kinds of 
mismatch and the correction candidate. The importance value of the mismatched link thus 
obtained is listed in the order where the group having the higher importance value is listed above. 
The administrator is capable of performing the step S6 in FIG 26, in which the confirmation and 
re-writing of the correction candidate is conducted, with referring to the ranking list. As the 
ranking list includes the importance value which is listed in the order as described above, the 
administrator is easily conduct the step S6 in FIG 26. 

After that, in the following step S7 in FIG 26, the correction reflecting unit 14 reflects 
the correction for each of the documents in the hypertext database 21 in accordance with the 
confirmed or corrected correction candidate. This step is similarly conducted as the first 
embodiment. 

Although it is described in this embodiment that the importance calculating unit 1 5 is 
operat e d to calculat e calculates the importance value of the mismatched link and to output the 
calculated importance value as a ranking list after the candidate providing unit 12 is operat e d to 
prevkte provides the correction candidate, the present invention is not limited to this embodiment. 
The order of processes is arbitrary changed. For example, in another embodiment, the 
importance calculating unit 15 may be operated to calculate the importance value of the 
mismatched link and to output the calculated importance value as a ranking list before the 
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candidate providing unit 1 2 is op e rated to provido provides the correction candidate. 

Although it is described in this embodiment that the administrator performs the 
confirmation of the outputted mismatched link and correction candidate, in the step S6 in FIG 26, 
the present invention is not limited to this embodiment. In another embodiment, the step 6 may 
be omitted and the steps SI through S7 may be automatically performed. 

Although it is described in this embodiment that the administrator decides a timing of 
confirmation, the present invention is not limited to this embodiment. For example, in another 
embodiment, the collection conditions and the extraction conditions may be previously 
determined, and the steps SI to S4, S8, and S9 may be automatically periodically performed. In 
this case, the results may be informed to the administrator by an electronic mail or the like. 

The collection and storage of information about a page and a link from the hypertext 
database 21 which is performed by the information collecting unit 11 shown in FIG 25 in this 
embodiment, may be performed by another data processing unit, which is not shown in the 
drawings. In such the case, the data processing unit 5 shown in FIG 25 of this embodiment 
does not need to include the information collecting unit 1 1 . Furthermore, the 

administrator can correct the mismatched parts in the hypertext database 21 by his/her hand 
while viewing a display screen of a list of the results shown in FIG 27. In such the case, the 
data processing unit 5 shown in FIG 25 of this embodiment does not need to include the 
correction reflecting unit 1 4. 

Furthermore, the administrator can select a correction candidate by himself/herself with 
the help of information shown in the list of the display screen in FIG 27 even if the list does not 
include a kind of mismatched link and the correction candidate. In such the case, the data 
processing unit 5 shown in FIG 25 of this embodiment does not need to include the candidate 
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providing unit 12 

Third preferred embodiment 

Referring now to FIG 28 of the drawings, there is shown a third preferred embodiment 
of the hypertext checking apparatus according to the present invention. 

As shown in FIG 28, the data processing unit 6 of the third embodiment includes: the 
same constitutional elements as those of the data processing unit 5 shown in FIG 25 in the 
second embodiment. The data processing unit 6 of this embodiment is different from the data 
processing unit 5 shown in FIG 25 in including a total score calculating unit 16 instead of the 
correction reflecting unit 14. 

The total score calculating unit 16 is adapted to calculate a total score of the targeted site 
based on the mismatched link detected by the condition detecting unit 13 and the importance 
value of the mismatched link calculated by the importance calculating unit 15. In this 
embodiment, the total score may be calculated based on the number of the mismatched links or a 
ratio of the number of mismatched links to the total number of links, as well, in addition to using 
the sum of the value of the mismatched link calculated by the importance calculating unit 15. 

The operation of the hypertext checking apparatus according to the present invention 
will be described in the followings with reference to the drawings. 

The operations of the information collecting unit 11, the candidate providing unit 12, the 
condition detecting unit 13, and the importance calculating unit 15 of this embodiment, shown in 
the steps SI to S4, and S8 in FIG. 29, are same as those of the second embodiment shown in FIG 
26, thereby the description to these steps is omitted. 

In the above second embodiment, the correction is reflected to the hypertext database 21 
in accordance with the correction candidate, after detecting the mismatched link. As shown in 



60 



SUBSTITUTE SPECIFICATION (Marked-up version) Attorney Docket no. Q77945 

U.S. Application No. 10/685,456 

the step S10 in FIG 29, the total score calculating unit 16 is op e rat e d to calculat o calculates the 
total score of the targeted site based on the importance value calculated by the importance 
calculating unit 15 after the mismatched link is detected in the step S3. Then, the total score 
calculating unit 16 outputs the calculated total score. 

The total score calculating unit 16 may periodically perform this calculation. The total 
score calculating unit 16 may then output the calculated total score. FIG 30 shows the 
outputted results of the total score in accordance with times. 

With these results, it is possible to see progress of improvement in quality of the targeted 
site. Referring to FIG 30, as the time goes on, a rise in total score becomes saturated. It is 
understood from this result that the process for improving the quality of the targeted site comes 
to an end. 

In this embodiment, the total score calculating unit 16 may calculate the total score at 
regular intervals, and an alert may be informed when a predetermined condition is fulfilled, such 
that the total score or the importance value of the parts detected as the mismatched link exceeds a 
predetermined threshold. With this function, the administrator can receive the alert when the 
quality of site declines. 

The total score calculating unit 16 may calculate the total score of each of a plurality of 
different sites "A" to "M". FIG. 31 shows an example of the results outputted by the total score 
calculating unit 16. Here, the result is listed in descending order in level. With this result, the 
administrator is capable of comparing quantitatively qualities of the sites with e ach oth e r . It is 
seen from FIG. 31 that the quality of the site "A" is twice as excellent as that of the site "E" for 
example. 

The effect of this embodiment will be described in the followings. 
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In this embodiment, the total score of the quality of the targeted site is calculated based 
on the number of the detected mismatched links and the importance value. For this reason, it is 
possible to grasp progress of improvement in quality of site, and compare quantitatively qualities 
of the different sites with e ach oth e r . 

Although the data processing unit 6 of this embodiment includes the information 
collecting unit 11, the information collecting unit 1 1 may be omitted from the data processing 
unit 6, because of the fact that the collection and storage of information about a page and link 
from the hypertext database 21 which is performed by the information collecting unit 11 in this 
embodiment, may be performed by another data processing unit, not shown. 

Although it is not mentioned, the reflection or correction of the detected mismatched 
parts in the hypertext database 21 may be performed upon request. When the reflection is 
performed, the administrator may correct the mismatched parts in the hypertext database 21 by 
his/her hand while viewing a display screen of a list of the results shown in FIG 27. 
Alternatively, there may be provided the correction reflecting unit 14 similar to that of the second 
embodiment. 

Even if there are no information about a kind of mismatched link or the correction 
candidate, the administrator can derive a correction candidate from information, except the kind 
of mismatched link or the correction candidate, as shown on the display screen in FIG 27. 
Therefore, the candidate providing unit 12 in this embodiment may be omitted from the data 
processing unit 1 . 

Fourth preferred embodiment 

The fourth preferred embodiment of the hypertext checking computer program product 
according to the present invention will be described in the followings with reference to the 
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drawings. 

The fourth preferred embodiment of the hypertext checking program product includes a 
computer usable storage medium, not shown in the drawings, such as a CD-ROM, DVD-ROM, 
MO, hard disk, EPROM, EEPROM, and so on, or downloaded from a Network server, such as 
Internet, having computer readable code embodied therein for checking a hypertext. 

Referring now to FIG 32 of the drawings, there is shown one example of a system 
including an input unit 501, a data processing unit 502, an output device 503, and a storage 
device 504 which are similar to the constitutional elements of the apparatus of the first preferred 
embodiment. This system farther includes a hypertext checking program 500 for carrying out a 
function of the fourth preferred embodiment of the hypertext checking program product 
according to the present invention which is similar to that of the first embodiment of the 
hypertext checking apparatus. 

The input unit 501 is adapted to allow an operator to input an instruction therethrough. 
The input unit 501 is such as a mouse, a keyboard, and so on. The output device 503 is adapted 
to output a processing result from the data processing unit 502. The output device 503 is, for 
example, a display screen of a displaying unit, a printer, and so forth. 

The hypertext checking program 500 is read out from the computer usable storage 
medium to the data processing unit 502. The hypertext program 500 is then executed by the 
data processing unit 502 to control the operation of the data processing unit 502, and to create an 
input memory 505 and a working memory 506 in the storage device 504. The hypertext 
checking program 500 can therefore establish, as the data processing unit 502, functions of the 
information collecting unit 11, the candidate providing unit 12, the condition detecting unit 13 
and the correction reflecting unit 14 in the first embodiment of the hypertext checking apparatus 
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shown in FIG. 1 . The data processing unit 502 thus constructed can perform the steps which are 
the same as those of the first embodiment by executing the hypertext checking program 500. 

The data processing unit 502 and the storage device 504 shown in FIG 32 correspond to 
the data processing unit 1 and the storage device 2 shown in FIG 1, respectively. In this 
embodiment, the data processing unit 502 may be operated to access an external database by way 
of a network, such as Internet, in addition to the hypertext database 21 which is stored in the 
storage device 2 and a target for the check shown in FIG 1 . 

Fifth preferred embodiment 

The fifth preferred embodiment of the hypertext checking computer program product 
according to the present invention will be described in the followings with reference to the 
drawings. 

The configuration of the fifth embodiment is shown in FIG 32 which is the same figure 
of the above fourth embodiment. The fifth preferred embodiment of the hypertext checking 
program product includes a computer usable storage medium, not shown, having computer 
readable code embodied therein for checking a hypertext. 

The hypertext checking program 500 is read out from the computer usable storage 
medium to the data processing unit 502. The hypertext program 500 is then executed by the 
data processing unit 502 to control the operation of the data processing unit 502, and to create an 
input memory 505 and a working memory (or working area) 506 in the storage device 504. The 
hypertext checking program 500 can therefore establish, as the data processing unit 502, 
functions of the information collecting unit 11, the candidate providing unit 12, the condition 
detecting unit 13, the correction reflecting unit 14 and the importance calculating unit 15 in the 
second embodiment of the hypertext checking apparatus shown in FIG 25. The data processing 
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unit 502 thus constructed can perform the steps which are the same as those of the second 
embodiment by executing the hypertext checking program 500. 

The data processing unit 502 and the storage device 504 shown in FIG 32 correspond to 
the data processing unit 5 and the storage device 2 shown in FIG. 25, respectively. In this 
embodiment, the data processing unit 502 may be operated to access an external database by way 
of a network, such as Internet, in addition to the hypertext database 21 which is stored in the 
storage device 2 and a target for the check shown in FIG 1. 

Sixth preferred embodiment 

The sixth preferred embodiment of the hypertext checking computer program product 
according to the present invention will be described in the followings with reference to the 
drawings. 

The configuration of the sixth embodiment is shown in FIG 32 which is the same figure 
of the above fourth embodiment. The sixth preferred embodiment of the hypertext checking 
program product includes a computer usable storage medium, not shown, having computer 
readable code embodied therein for checking a hypertext. 

The hypertext checking program 500 is read out from the computer usable storage 
medium to the data processing unit 502. The hypertext program 500 is then executed by the 
data processing unit 502 to control the operation of the data processing unit 502, and to create an 
input memory (or input buffer) 505 and a working memory 506 in the storage device 504. The 
hypertext checking program 500 can therefore establish, as the data processing unit 502, 
functions of the information collecting unit 11, the candidate providing unit 12, the condition 
detecting unit 13, the importance calculating unit 15 and the total score calculating unit 16 in the 
second embodiment of the hypertext checking apparatus shown in FIG 28. The data processing 
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unit 502 thus constructed can perform the steps which are the same as those of the third 
embodiment by executing the hypertext checking program 500. 

The data processing unit 502 and the storage device 504 shown in FIG 32 correspond to 
the data processing unit 6 and the storage device 2 shown in FIG. 28, respectively. In this 
embodiment, the data processing unit 502 may be operated to access an external database by way 
of a network, such as Internet, in addition to the hypertext database 21 which is stored in the 
storage device 2 and a target for the check shown in FIG 1 . 

As described above, the following effect can be achieved according to the embodiments 
of the present invention. 

The present invention has a first advantage over the prior art in making it possible to 
detect all kinds of logicall y various logical mismatches. It is understood from the following 
description why the present invention has the first advantage. According to the present 
invention, a kind of tho detectable logically logical mismatches include: (1) putting a link to a 
wrong destination; (2) a link for tho to expired information; (3) disunity inconsistency in the link 
source d e scriptions hyperlinks ; and (4) disunity inconsistency in the styles of the link sourc e 
descriptions hyperlinks , as the mismatched link detecting method includes the steps of: extracting 
the link information from the hypertext database; grouping the links of each item of the-link 
information; and detecting fee-a _particular link excluded from the group to consid e r th e d e t e cted 
particular link to be a mismatched link. The logically logical mismatches, such as (2) the link 
feF4he-to_expired information, can be detected by repeating the collection of the link information 
periodically, and focusing on a change in the link information in accordance with time. 

Furthermore, (5) the phantom link for one exampl e of th e logically mismatch e s can be 
detected by detecting the-ajink having no link source d e scription h yperlink . and (6) the loop link 
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for anoth e r e xample of tho logically mismatch e s can be detected by detecting the links included 
in a group of links forming a loop and having the link sourc e d e scriptions hyperlinks 
corresponding the group of links relevant to a same topic. 

The present invention has a second advantage over the prior art in that the correcting 
method of the mismatched links can be automatically determined, thereby making it unnecessary 
for the administrator to consider how to correct the mismatched links. As the candidate 
correcting method includes a process of automatically calculating the correction so as to 
harmonize the link information of the particular link with the link information of the other links 
in the group, the above advantage can be obtained. 

The present invention has a third advantage over the prior art in that the checking 
efficiency of ch e ck by the administrator can be considerably enhanced. As the grouped 
mismatched links can be collectively displayed on a display screen, all-the administrator has to 
do is to can confirm a part o f some links, thereby making it possible to judge whether the 
remaining links are mismatched or not. 

The present invention has a fourth advantage over the prior art in making it possible to 
grasp th e corr e ction it e m e v e ry pag e s correcting items on every page , int e nsiv e ly e xamin e 
examining a mismatch against a key page, and e xamin e examining the suitability of the-an 
expression which is used for the link sourc e d e scriptio nh yperlink . As th e r e may b e provid e d a 
A^display screen may be provided displaying display e d thereon a list having three items 
including: (1) a link source d e scriptio n hyperlink ; (2) identification information about a link 
source web page; and (3) identification information about a iifil^target web p age, the above 
advantage can be obtained. 

The present invention has a fifth advantage over the prior art in making it possibl e to 
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grasp progr e ss of improvement in improving the quality of site, and compar e quantitatively 
comparing qualities of the different sites with e ach oth e r . As the total score of the quality of the 
targeted site is calculated based on the number of the detected mismatched links and the 
importance, the above advantage can be obtained. 
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